In the previous post, we had a look at the kernel trick and how it can be applied mathematically to enable us to find an optimum hyperplane that can separate non-linearly separable data points. In this post, we shall be exploring certain types of popular kernels.
Understand some popular types of kernels and how they work.
The main question that we would be answering in this post as already stated in the introduction is, “What the different kinds of kernels that can be applied to help find an optimum hyperplane given some group of data points?”
- The Linear Kernel: The linear kernel is the simplest kernel that can be applied to a group of data points and it is defined by:
Where x and x’ are two different vectors and x.x’ is the dot product that is being performed on the two vectors in another space.
2. The Polynomial Kernel: The polynomial kernel can be expressed in the following generic way:
Where c is a constant term and d is the degree of the polynomial kernel. Higher degrees in the polynomial kernel tend to overfit the data and may not lead to an optimum hyperplane.
3. The Gaussian or Radial Basis Function (RBF) Kernel: A radial basis function is one whose value is dependent upon the distance from the origin or the distance from some point. The gaussian or RBF kernel is usually used for more complicated data scenarios such as the one depicted in Fig. 1 below:
The rbf kernel works to return the dot product between the vectors representing the data points in an infinite dimensional space. It can be expressed in the following generic way:
The performance of the RBF is highly dependent on the gamma value. A small gamma value would yield an output which is similar to that of the linear kernel and a large gamma value would yield an output which is too heavily influenced by each support vector in the data. This is demonstrated in the images in Fig. 2 below:
In the next post, we would be looking at the Sequential Minimization Optimization (SMO) algorithm.