Support Vector Machines — Lecture series — Classifying data with a hyperplane
At this point in time, you should be very comfortable with the idea of a hyperplane and the mathematical equation that is used to represent a hyperplane. In this post, we are going to apply this knowledge to understand how a hyperplane can be applied to classify points.
Learning objective:
Understand how hyperplanes are applied in the classification of points.
Main question:
Consider the graph in Fig. 1 below:
Fig. 1 showcases linearly separable points on a graph. The question is, how are hyperplanes applied to perform a binary classification of the points demonstrated in the graph above?
First off, we learnt from the previous post that the equation of a hyperplane is:
w.x + b = 0, where w and x are vectors.
If w = (0.4, 1.0) and b = -9, we get the hyperplane showcased in Fig. 2 below:
For the hyperplane showcased in the graph of Fig 2, we can devise a function h(x) for it such that, this function takes in a value of x and then it generates a value of either -1 or +1 which indicates which side of the hyperplane that a specific value of x lies.
For instance, if the function h(x) generates a -1 for a particular value of x, we can say that, that value of x lies on the side of the “blue stars” and if the function h(x) generates a +1 for a particular value of x, we can say that, that value of x lies on the side of the “red triangles”.
The representation of h(x) can be represented mathematically by the following piecewise function:
Let’s take an example to make this clearer. Supposing the value of x given is (8,7) , when this value of x is placed in the function h(x), it is computed as:
w.x + b = 0.4 x 8 + 1 x 7–9 = 1.2
Since 1.2 is greater than 0, the function h(x) is going to output a value of +1 and this indicates that the value of x = (8,7) should be classified on the side of the “red triangles”.
In the next post, we shall be talking about the perceptron.