Support Vector Machines — Lecture series — Kernels: An introduction

In the last couple of lecture posts, we have been talking about how to determine hyperplanes that can optimally separate given datasets. In this the next couple of posts, we will look at the concept of kernels, what they are and how they are so useful in machine learning.

Learning objective:

Understand the context in which kernels are applied in machine learning.

Main question:

Observe the image in Fig. 1 below:

Fig. 1

Suppose we have to figure out the hyperplane that can correctly classify the different points in the image, we would quickly come to the realisation that this is a very hard task. How are we going to come up with a plane that can correctly discriminate between the points given the fact that the different classes of points seem to be clustered together?

Perhaps, to solve this problem, we have to look at it from another dimension, literally. When we scale the points represented in Fig. 1 from its 2 dimensional representation to a 3 dimensional representation, we get the following result, as demonstrated in Fig. 2:

Fig. 2

Fig. 2 might still not look very impressive given the fact the points may still appear to be muddled together. However, if we rotate the axes at look at it from the angle demonstrated in Fig. 3, we would soon realise that the points are linearly separable in this dimension after all.

Fig. 3

This therefore means that, to produce a hyperplane that can correctly classify different classes of points that are represented in a similar way to the classes of points in Fig. 1, we have to transform each and every single point to a higher dimension.

Even though this seems like a good strategy initially, it maybe impractical to implement at scale. Imagine we are trying to produce a hyperplane that can correctly classify millions or billions of points, it would be computationally expensive to convert every single one of those points to a higher dimension.

This is where kernels come to the rescue :)

In the next post, we would be learning more about how kernels are applied to the situation presented in this post.