Natural Language Processing — Neural Networks and Neural Language Models Lecture series — Feed-Forward Neural Networks
In this post, we will be learning about the concept of feed-forward neural networks and the mathematical representation of this concept.
Feed-Forward Neural Network:
A feed-forward neural network is simply a multi-layer network of neural units in which the outputs from the units in each layer are passed to the units in the next higher layer. These networks do not have any cycles within them. That is, the outputs within the network do not flow in a cyclical manner.
To gain a refreshed understanding of what neural units are and how they work, you can read about them here.
Graphical representation of a Feed-Forward Neural Network:
The image in Fig. 1 below demonstrates a graphical representation of a feed-forward neural network that has n number of input and output values:
In Fig. 1, the values of x (x1, x2, … , xn0) represent the input values of the network, whiles the values of y (y1, y2, … , yn2) represent the output values of the network. The last input value of the network ‘xn0’ has a subscript of ‘n0’ because the n input values of the network reside on the first layer of the network which is identified as Layer 0. The last output value of the network ‘yn2’ has a subscript of ‘n2’ because the n output values of the network reside on the last layer of the network which is identified as Layer 2.
The first layer of the network is referred to as the input layer, the second layer of the network is referred to as the hidden layer and the last layer of the network is referred to as the output layer.
The ‘W’ in the network represents a matrix which contains the weights to be applied to the input values. The ‘U’ in the network also represents a matrix which contains the weights to be applied to the output values of the hidden layer. The ‘b’ in the network represents a vector containing the bias terms to be applied to the input values.
Let’s assume that the feed-forward neural network depicted in Fig. 1 is doing a multinomial classification, the output layer is going to give a probability distribution across the output nodes.
Mathematical representation of a Feed-Forward Neural Network:
Assuming that the Feed-forward neural network graphically represented in Fig. 1 is performing a multinomial distribution, it can be mathematically represented with these 3 expressions:
h = activation function(W.x + b)
z = U.h
y = softmax(z)
Since we are assuming that the feed-forward neural network under consideration performs a multinomial classification, it is prudent to choose a softmax function to normalise any vector of real values received through the performance of a matrix multiplication between U and h. The normalisation process is meant to transform the vector of real values into a vector that represents a probability distribution.
The mathematical expression represented in Fig. 2 depicts a softmax expression:
In the next post, we shall be introduced to the concept of training Neural Networks.