In the previous post, I spoke about the two main steps involved in the implementation of the sequential minimal optimization algorithm, which include first selecting the Lagrange multipliers to optimize and then optimizing the chosen Lagrange multipliers through the implementation of an analytical method. In this post, I would be focusing on the process involved in selecting which Lagrange multipliers to optimize. To properly understand this post, it would be helpful to get a second look at the posts about KKT conditions and Lagrange multipliers.
Understand the process involved in choosing which Lagrange multipliers to optimize.
In the previous post, I gave a general overview on what the Sequential Minimal Optimization algorithm was and how it works. In the next subsequent posts, we would be breaking down the concepts regarding the SMO algorithm into manageable chunks that you can easily assimilate, understand and enjoy. In this post, we would be stating the two components of the SMO algorithm.
Knowing the 2 components of the SMO algorithm.
We have a general sense of how the SMO algorithm works, but what are the 2 main components of the algorithm that power how it accomplishes…
In the previous posts, we were introduced to the SVM optimization problem, which is demonstrated in Fig. 1 below:
This optimization problem can be solved by using a convex optimization package such as CVXOPT. However, solving this problem with such a package becomes problematic when we are dealing with large datasets. This is because the performance of a convex optimization on a large dataset involves numerous matrix multiplications which takes a lot of time to solve. It is this situation the warranted the creation of the SMO algorithm. This algorithm helps to solve the SVM optimization problem more quickly.
In the previous post, we had a look at the kernel trick and how it can be applied mathematically to enable us to find an optimum hyperplane that can separate non-linearly separable data points. In this post, we shall be exploring certain types of popular kernels.
Understand some popular types of kernels and how they work.
The main question that we would be answering in this post as already stated in the introduction is, “What the different kinds of kernels that can be applied to help find an optimum hyperplane given some group of data points?”
In this tutorial, we would be learning about how to directly apply the kernel to help us to find the optimum hyperplane between data points belonging to different classes which are not linearly separable in nature. The application of the concept of kernels in the determination of the hyperplane in such a scenario is usually referred to as the “kernel trick”.
Understand how we can directly apply the kernel to find the optimum hyperplane between data points.
In the previous posts, we learnt the concept of the kernel and we understood how instrumental it can be…
In the last post, I discussed the problem that is involved with transforming every point from one dimension to another dimension just to be able to find the right hyperplane that can separate the points into their respective classes. This is a computationally expensive task to do, especially if there are many points. I mentioned in the last post that kernels could help us to get around this problem but I did not specifically explain what they were or even mention how they can help us. In this post, we will be addressing this question.
Understand the concept…
In the last couple of lecture posts, we have been talking about how to determine hyperplanes that can optimally separate given datasets. In this the next couple of posts, we will look at the concept of kernels, what they are and how they are so useful in machine learning.
Understand the context in which kernels are applied in machine learning.
Observe the image in Fig. 1 below:
Suppose we have to figure out the hyperplane that can correctly classify the different points in the image, we would quickly come to the realisation that this is a…
In the last post, we looked at the idea of “complementary slackness” and how it showcases the relationship between the variables of the primal problem and the constraints of the dual problem and vice versa. In this post, we would be looking at the reason why the concept of complementary slackness holds true.
Understand why the concept of ‘complementary slackness’ is indeed true.
How can we ensure that the concept of ‘complementary slackness’ would hold up every single time?
To answer this question, we would first have to recall the idea of the duality theorem, which…
In this lecture, we will be learning about an open source deep learning framework called PyTorch. PyTorch can be viewed as a library that provides us with packages that we can use to manipulate tensors.
A tensor is basically a mathematical object that is used to hold multidimensional data. Tensors can be represented as n-dimensional arrays of scalars. For instance, a tensor of order 0 is a scalar, a tensor of order 1 is a vector, a tensor of order 2 is a matrix and so on.
In this tutorial, we will be using the PyTorch framework to look at…
In the previous post, we spoke about what it means to ‘train’ a feed-forward neural network. We also briefly touched on the different tasks that are performed in the training process of a feed-forward neural network. In this post, we will be solely focused on the loss function, the cross-entropy loss function to be precise.
What is a loss function and what role does it play in training a neural network?
As briefly stated in the previous post, the main purpose of the loss function is to indicate how close the predicted output value of the neural network is to…