Support Vector Machines — Lecture series — Kernels part 3 (The Kernel trick)

In this tutorial, we would be learning about how to directly apply the kernel to help us to find the optimum hyperplane between data points belonging to different classes which are not linearly separable in nature. The application of the concept of kernels in the determination of the hyperplane in such a scenario is usually referred to as the “kernel trick”.

Learning objective:

Understand how we can directly apply the kernel to find the optimum hyperplane between data points.

Main question:

In the previous posts, we learnt the concept of the kernel and we understood how instrumental it can be in the determination of an optimum hyperplane in scenarios where we are trying to find such a hyperplane between points that are not linearly separable.

But the main question is, how do we actually apply the kernel in the mathematical derivations that we have formulated to construct the hyperplane?

In the image in Fig. 1 below, you will see a representation of the Wolfe-dual problem which is solved to find the optimum hyperplane:

Fig. 1

The idea of kernels is applied to the Wolfe-dual problem by first selecting the kernel function K to be applied and then placing in xi and xj as parameters to the function as depicted in Fig. 2 below:

Fig. 2

In the next post, we would be looking at the different types of kernel functions.