In the last post, I discussed the problem that is involved with transforming every point from one dimension to another dimension just to be able to find the right hyperplane that can separate the points into their respective classes. This is a computationally expensive task to do, especially if there are many points. I mentioned in the last post that kernels could help us to get around this problem but I did not specifically explain what they were or even mention how they can help us. In this post, we will be addressing this question.
Understand the concept of kernels
The main question I believe I have to address before I proceed any further is: “What are kernels?”
Well, to give an informal definition, kernels are mapping functions that are able to return the result of a dot product performed in another space or dimension.
Consider the following data points in Fig. 1 below:
Suppose we had to compute the dot product of these 2 points in the 9th dimension by first converting them into the 9th dimension. We would have to undergo the computations represented in Fig. 2 below:
As we can see these computations seem to be very complex.
However, with kernels, we can obtain the same result without having to first convert the data points into the 9th dimension. Fig. 3 demonstrates how we can use kernels to do this:
In the next post, we shall talk about how to apply this knowledge of kernels in the derivation of an optimum hyperplane.