Support Vector Machines — Lecture series — An introduction to the Sequential Minimal Optimization(SMO) Algorithm

In the previous posts, we were introduced to the SVM optimization problem, which is demonstrated in Fig. 1 below:

Fig. 1

This optimization problem can be solved by using a convex optimization package such as CVXOPT. However, solving this problem with such a package becomes problematic when we are dealing with large datasets. This is because the performance of a convex optimization on a large dataset involves numerous matrix multiplications which takes a lot of time to solve. It is this situation the warranted the creation of the SMO algorithm. This algorithm helps to solve the SVM optimization problem more quickly.

Learning objective:

The objective of this post is to give a general overview of how the SMO algorithm works.

Main question:

What is the main idea behind how the SMO algorithm is able to solve convex optimisation problems more quickly than commonplace convex optimization packages?

To answer this question, let’s revisit what we are trying to achieve in the SVM optimization problem. In the algorithm, we are trying to find the values of alpha that lead to a minimization of the value of the objective function.

The sequential minimization algorithm revolves around the idea of simplicity. Instead of trying to tackle the whole SVM optimization problem, the SMO tries to break it down into a sequence of simpler problems. For instance, instead of trying to find out all the alpha values that satisfy the constraints and minimise the objective function in one go, the SMO algorithm would rather try to first settle on just two alpha values and then try to find out the value of these alpha values that minimise the objective function. After it has succeeded in doing that, it moves on to two different alpha values and performs the same task. The SMO algorithm would keep on repeating this strategy until all the alpha values have been exhausted.

This incremental approach enables the SMO algorithm to figure out how best to minimise the objective function without the burden of numerous computation.

In the next post, we shall look at how to implement the SMO algorithm.