Neural Networks consist of the following components
An input layer, x
An arbitrary amount of hidden layers
An output layer, ŷ
A set of weights between each layer, W
A choice of activation function for each hidden layer, σ. In this tutorial, we’ll use a Sigmoid activation function.
The output ŷ of a simple 2-layer Neural Network is:
ŷ = sigmoid(W2 * sigmoid(W1 * X))
You might notice that in the equation above, the weights W is the only variable that affects the output ŷ. Naturally, the right values for the weights determines the strength of the predictions. The process of fine-tuning the weightsfrom the input data is known as training the Neural Network.
Each iteration of the training process consists of the following steps:
Calculating the predicted output ŷ, known as feedforward Updating the weights and biases, known as backpropagation
For every feedforward performed, we calculate the cumulative error as:
E = (1/2) * sum((y - ŷ)2)
This will be our loss function. Our goal in training is to find the best set of weights that minimizes the loss function.
In order to know the appropriate amount to adjust the weights by, we need to know the derivative of the loss function with respect to the weights.
If we have the derivative, we can simply update the weights by increasing/reducing with it(refer to the diagram above). This is known as gradient descent. For the derivative:
Using this derivative, we can update the weights.