Support Vector Machines — Lecture series — The problem of scale invariance
In the previous post, we spoke about how to select a hyperplane that correctly classifies the data points. In this post, we are going to be talking about the problem of scaling with regards to hyperplanes and how to solve this problem.
Understand the problem that scaling can cause and how to address this issue.
Consider the image in Fig. 1 below:
Suppose the hyperplane in Fig. 1 has a value of w which is (2,1) and a value of b which is 5. If the values of w and b are multiplied by 10, we say that we have scaled them by 10. The interesting thing about scaling a hyperplane is that, scaling a hyperplane does not change its direction or the point at which it interacts with the vertical axis. In fact, a scaled hyperplane will look just about the same as its unscaled version. So what is the issue with scaled hyperplanes?
Well the issue with scaled hyperplanes come from the functional margins of these hyperplanes. From the previous post, we learned that the functional margin of a hyperplane can be obtained by using the following formula:
h(x) = y(w.x + b)
The problem here is that when we scale the values of w and b, the values generated by the functional margin are way larger even though nothing has really changed about the direction or position of the hyperplane.
To solve this problem, we can divide the scaled versions of w and b by the norm of w to make functional margin scale invariant. This is shown below: