Support Vector Machines — Lecture series — The SVM Lagrangian problem part 2

In the previous post, I spoke about how to apply the concept of Lagrange multipliers to solve an SVM optimisation problem. In that post, I spoke about how hard it was to solve the derived Lagrange function for an example SVM optimisation problem. In this post, we will be talking about why it is hard to solve such a problem and we will be talking about how to go about solving such a problem.

Learning objective:

Understand how hard it is to solve a Lagrange function for an SVM optimisation problem and how to go about solving such a problem.

Main question:

Consider the image of the SVM optimisation problem in Fig. 1 and the Lagrange function derived for the optimisation problem in Fig. 2 below:

Fig. 1
Fig. 2

Why may the optimisation problem stated in Fig. 1 be hard to solve? and how do we go about addressing that problem?

Well, the answer to the first question lies in both the number of constraints and the number of decision variables that we would have to account for in the statement of the optimisation problem.

If the optimisation problem had both a small number of constraints and a small number of decision variables, it would not be so hard to solve because it would not cost us a lot of CPU time and memory space to find an optimum solution that satisfies all of the constraints and minimizes the function.

However, because we have an ‘m’ number of examples which may be extremely large in number, it makes the optimisation problem very difficult to solve because there is now a lot more computation to be done to account for the increased complexity.

To address this problem, we can rewrite the SVM optimisation problem in the following form:

Fig. 3

This form works because instead of solving for the constraints of an optimisation problem that contains 3 decision variables, we now just have to solve for the constraints of an optimisation problem that just has one decision variable.

To place the rewritten optimisation problem into perspective, let’s consider the following:

b = c-dx

If we had to solve for the minimum value of ‘b’, it would make sense to first find the value of ‘d’ that is the largest or the maximum value of ‘d’ because that is what would cause the largest difference and hence help us better find the minimum value for ‘b’. After finding the maximum value of ‘d’, we can now find the values of ‘c’ and ‘x’ that help us to solve for the minimum value of ‘b’.

In the next post, we would be talking about the Wolfe dual problem.