Natural Language Processing — Neural Networks and Neural Language Models Lecture series — Training a Feed Forward Neural Network part 1

In the previous post, we looked at an overview of a feed forward neural network and we also had a look at its graphical representation. In this post, I would be introducing you to the basic ideas that entail training a feed-forward neural network.

What does it mean to ‘train’ a feed-forward neural network?

A feed-forward neural network is a supervised machine learning algorithm. This means that it is an algorithm that learns to accurately predict an output value y when given its corresponding input value x by creating a mathematical model that is based on already existing examples of input values and their corresponding output values.

In the previous post, we stated that a feed-forward neural network can be represented mathematically by the following expression:

y = softmax(U.(activation function(W.x + b)))

To ‘train’ a neural network simply means to figure out the right values of W and b for each layer i in the neural network to enable it to predict accurate values of y when given input values of x.

What would be needed to train a feed-forward neural network?

To train a neural network, we would first need a loss function. The purpose of this loss function is to compute the distance between the predicted output values of y generated by the neural network and the actual correct output values of y, given the input values of x.

The loss function is important because it gives us a fair idea of the performance of the neural network and it informs us of whether the neural network is getting better or worse over time.

The smaller the value generated by the loss function, the better. This is because small loss function output values inform us that there is little difference between the output values predicted by the neural network and the actual correct output values.

After settling on a loss function, the next thing that we would need to train our neural network is a method that would help us to choose the values of W and b that will reduce the values generated by the loss function in the subsequent iterations of the training process.

The method that we will choose to reduce the values generated by the loss function is the gradient descent algorithm. The gradient descent algorithm works to reduce the values generated by the loss function by first computing the partial derivative of the loss function value with respect to the weights and the biases in the neural network and then taking a step in the direction that reduces the gradient of the loss function value.

The only issue now is, a feed-forward neural network could have a large number of weights and biases and this could make it difficult to compute the partial derivative of the loss function value. To address this issue, the back propagation algorithm is employed.

In the subsequent posts, we would be looking at the constituent elements that are needed to train a feed-forward neural network in full detail. The next post will be focused on the loss function.