Support Vector Machines — Lecture series — The SVMs optimisation problem part 2

David Sasu
3 min readApr 28, 2021

--

In the previous post, we looked at how to derive an optimisation problem statement that could help us to find the most optimum hyperplane when given a set of linearly separable data points. We also briefly mentioned that it was hard to solve the optimisation problem in that state. In this post, we shall be looking at how to reframe the optimisation problem in a way that is more easy to solve.

Learning objective:

Understand how to reframe an optimisation problem statement to enable us to find the most optimum hyperplane.

Main question:

We stated in the previous post that the optimisation problem statement in Fig 1 is difficult to solve.

Fig.1

So now the main question is, how then do we go about obtaining the values of w and b?

Well, let’s begin that process by first reconsidering the relationship between the geometric margin and the functional margin. We know from our previous posts that the geometric margin can be obtained by dividing the functional margin by the norm of w (if this statement does not make sense, please revisit the post on geometric margin). This relationship is depicted in Fig 2:

Fig. 2

With the equation expressed in Fig. 2, we can rewrite Fig. 1 as the following:

Fig. 3

We can simplify the constraint in Fig. 3 by multiplying both sides of the inequality by the norm of w to get:

Fig. 4

Now, since the geometric margin does not change regardless of the scale of the values of w and b, and since we are solving for the optimum geometric margin, we can scale the values of w and b to the point where the functional margin F is equal to 1 and this would not affect the optimum geometric margin in any way. This is demonstrated in Fig. 5 below:

Fig. 5

Using the equation about M in Fig. 2, we can rewrite the statement in Fig 5 as:

Fig. 6

And since F has been scaled to 1, the statement in Fig. 6 can subsequently be rewritten as:

Fig. 7

Now maximising 1 divided by the norm of w is the same as minimising the norm of w. Let’s think about this for a second, if I have the following expression 1/x and I wanted to choose the value of x that would maximise it, I would choose a value of x like 0.000001. Because that would result in a value like 100,000.

So therefore, we can rewrite the expression in Fig. 7 to get:

Fig. 8

I hope you were able to follow the derivations :)

In the next post, we will finish off this topic and then move on to the concept of Lagrange multipliers.

--

--

No responses yet