Support Vector Machines — Lecture series — Choosing the optimum hyperplane

David Sasu
2 min readApr 8, 2021

--

So in the previous post, we mentioned that the main problem with the perceptron learning algorithm is that even though it produces a hyperplane that separates a linearly separable dataset, it does not produce the same hyperplane each time it is run. This is a problem because it may produce a hyperplane which may not be very good at classifying new data points. You can read the previous post to get a better understanding about this. In this post, we seek to answer the question of how we can choose the most optimum hyperplane among all of the possible hyperplanes produced by the perceptron learning algorithm.

Learning objective:

Understand how to choose the most optimum hyperplane among a set of possible hyperplanes generated by the perceptron learning algorithm.

Main question:

How do you choose the best hyperplane among all the different hyperplanes generated by the perceptron learning algorithm to separate a linearly separable dataset?

To answer this question, consider the images in Fig.1 and Fig. 2 below:

Fig. 1
Fig.2

Suppose the images in Fig. 1 and Fig.2 demonstrate the same set of linearly separable data points separated by 2 different hyperplanes produced by the perceptron learning algorithm, how do we find the best hyperplane?

To do find the best hyperplane, we would just have to follow the following simple algorithm:

  1. For each hyperplane, find out how far away each data point is from its corresponding hyperplane.

This can be done by using the hyperplane formula: h(x) = w.x + b

h(x) will tell us how far away x is from the hyperplane.

2. For each hyperplane, find out which data point is closest to it.

3. Take note of the distance between the closest data point and the hyperplane for each hyperplane.

4. Compare the distances obtained from Step 4, and select the hyperplane with the largest distance.

The main idea here is to always choose the hyperplane that is farthest from all of the points. This is because, such a hyperplane would always be able to correctly classify new data points.

--

--

No responses yet