Support Vector Machines — Lecture series — The perceptron

3 min readApr 2, 2021

In the previous post, we spoke about how to use a hyperplane to classify a given linearly separable dataset. In this post, we would be talking about the perceptron and the perceptron learning algorithm.

Learning objective:

Understanding what a perceptron is and how the perceptron learning algorithm works.

Main question:

In the previous post, we saw that the equation of the hyperplane can be represented as: h(x) = w.x + b and that once we know the values of w and b, we can use the hyperplane to classify a given set of linearly separable data points.

Before I even ask the main question, I would like to point out that the equation for the hyperplane can also be represented as h(x) = w.x instead of h(x) = w.x + b. Since w and x are vectors, we can add b to the start of the vector w and then add 1 to the start of the vector of x. Doing this would yield h(x) = w.x

To make it clearer, let’s see it this way:

if h(x) = w.x + b, where w = (w0,w1,w2) and x = (x0,x1,x2)

then h(x) = w.x, where w = (b,w0,w1,w2) and x = (1,x0,x1,x2)

So let’s get back to the main question now that we have this out of the way, suppose we are given a set of linearly separable data points and each point is of the form (x,y), where x refers to the data value and y represents where that data value has been classified to, how can we find the best possible value of w in the hyperplane formula, h(x) = w.x, that can enable us to correctly classify the data points?

We can rely on the perceptron to enable us to accomplish this task.

The perceptron is a learning algorithm that was invented by Frank Rosenblatt and its goal is to formulate the best possible hyperplane that can effectively classify a set of given linearly separable data points.

So how does the perceptron learning algorithm work?

It is quite a simple process really. It follows the following steps:

Step 1: Randomly pick a value of w to formulate a hyperplane and use that to classify the data points. Yes, we have got to start from somewhere right? To find the best value of w, our best bet of finding it is to start from any value of w at all.

Step 2: Pick an example data point that was misclassified by the hyperplane that was chosen in Step 1. And then formulate another hyperplane (by picking a new value of w), that would correctly classify this example. This process is called the update rule.

Step 3: Classify the rest of the data again with the new hyperplane that was formulated in Step 2.

Step 4: Repeat steps 2 and 3 until there is no misclassified example.

If you follow the 4 steps outlined above, you would have successfully developed run a perceptron learning algorithm to classify a set of linearly separable data points by finding the right value of w.

In the next post, we will be taking a closer look at the update rule.

Support Vector Machines — Lecture series — The perceptron

Written by David Sasu

No responses yet