Support Vector Machines — Lecture series — The Wolfe dual problem
In the previous post, we spoke about how hard it was to solve an SVM optimisation problem in the initial way in which it was expressed. We also spoke about how to re-write the optimisation problem into a way in which it would be much easier to solve. The alternative that we suggested involved looking at the SVM optimisation problem through the lens of a ‘min max’ problem instead of through the lens of just a ‘min’ problem. In this post, we would be looking at this suggested alternative approach and we will be looking at way in which we can further evolve it into yet another approach.
Learning objective:
Understand how to further evolve the ‘min max’ SVM optimisation approach into a ‘Wolfe dual’ approach and know why this approach is better than the ‘min max’ approach.
Main question:
Consider the SVM optimisation problem expressed in Fig. 1 below:
Is there a way in which we could express this optimisation problem such that we would not only just be optimising for one decision variable but the constraints would also contain just one decision variable?
First recall that the Lagrangian function is of the form:
Since we want to rewrite the SVM optimisation problem so that we would just be optimising for one decision variable, which is alpha, we can minimise the optimisation problem with regards to w and b.
We can do this by taking the partial derivatives of the Lagrangian function with regards to just w and b. If you are not familiar with the concept of partial derivatives, I suggest you watch this video to get yourself comfortable with the concept.
When we take the partial derivatives of the Lagrangian function with regards to w, we get the following result:
Note that the partial derivative of the Lagrangian function is being equated to zero because we are solving for the minimum of the function.
By rearranging the terms in the equation in Fig. 3, we get the following result:
When we substitute this value of w into the Lagrangian function, we get the following expression as shown in Fig. 5:
By further simplifying the expression demonstrated in Fig. 5, we get the following:
But we are not yet done with our partial derivations, we took care of w and we must now take care of b. We now have to take the partial derivative of the Lagrangian function with respect to b in the resulting expression in Fig 6.
Taking the partial derivative of the Lagrangian function with respect to b is going to lead to the following result:
Now you would have observed that by addressing the minimisation aspect of the optimisation problem that we started out with, we have now managed to transform the SVM optimisation problem into the following problem instead:
The optimisation problem described in Fig. 8 is referred to as the Wolfe — dual problem.
I hope this was quite simple and easy to follow :)
In the next post we will be talking about the Karush-Kuhn-Tucker conditions.