Support Vector Machines — Lecture series — How to select a hyperplane that correctly classifies the data
In the previous post, we talked about the potential problems that may arise with interacting with the values on the negative side of the hyperplane. In this post, we will be talking about how to choose the correct hyperplane to classify your data.
Learning objective:
Understand how to choose the right hyperplane to classify a given set of data points.
Main question:
Consider the 2 images, Fig 1 and Fig 2, below:
Even though the hyperplanes in both images appear to correctly classify the data points in the images, one of the hyperplanes is incorrectly classifying the data points. The main question is how do you identify what the right hyperplane is?
Well, let’s start off by looking at the format of the data points we are dealing with. The data points used to train our algorithm to formulate the hyperplane usually comes in the form (x,y), where x is a vector of values and y is the ‘classification value’.
Note: Here ‘classification’ value is used to represent whether the value of x in the data point belongs to the positive class (the right-hand side of the hyperplane) or the negative class (the left-hand side of the hyperplane).
Another note: The classification value is +1 or -1.
In the computation of the hyperplane for each data point, instead of just computing the following value:
h(x) = w.x + b
We include the classification value in the computation. Therefore, the hyperplane computation for each data point now becomes:
h(x) = y(w.x + b)
The reasons why we include the classification value in the hyperplane computation is this:
1. The result of the hyperplane computation would always be positive if the point has correctly been classified.
2. The result of the hyperplane computation would always be negative if the point has incorrectly been classified.
Let’s take this example to better understand this:
If the data point is ([2], -1), where x is [2] and y is -1, and it is fed into the hyperplane that has w = [1] and b = 1, the result would be:
h(([2], -1)) = -1 ([2].[1] + 1) = -1 (3) = -3
But this would be a misclassification because if you remember from our previous discussion on the hyperplane, whenever w.x + b ≥ 0, h(x) should be +1.
Therefore, whenever given 2 different hyperplanes like the ones in Fig.1 and Fig 2 and we are trying to find out which one correctly classifies the data points:
For each data point in each respective hyperplane, we have to compute
h(x) = y(w.x + b)
Then select the smallest resulting value of the computation for each hyperplane, compare both of them and then select hyperplane that produced the largest value as the correct hyperplane.
The computation of h(x) which includes the classification value is referred to as the functional margin.