The backpropagation algorithm is the heart of deep learning! That is the core reason why we can have those advanced models like LLMs.
In a previous video, we saw we can use the computational graph that is built as part of deep learning models to compute any derivatives of the network outputs with respect to the network inputs. Now we are going to see how we can use this computational graph to get the network to learn from the data by using the backpropagation algorithm. Let's get into it!
Watch the video for the full content!
The goal of a neural network is to generate an estimate of the target we are trying to predict. We use a loss function to compare the target to its estimate. The optimization problem is about minimizing the loss function.
Typical loss functions are the log-loss and the mean squared error loss:
To minimize the loss function, we take the gradient of the loss function with respect to the network parameters and find the zeros of the resulting function.
But solving this equation can be very hard, so instead we use optimization techniques like the gradient descent algorithm. We update the parameters following the gradient in the direction where the loss function decreases.
Until we reach a local minima of the loss function.
To apply the gradient descent algorithm, we are going to use the computational graph. We first compute the graph in the forward pass.
And we can back-propagate the gradients of the loss function for each computational block and node in the graph by using the chain rule in the backward pass.
Now that we have the gradient of the loss function for all the parameters in the graph, we can apply one step of the gradient descent algorithm.
SPONSOR US
Get your product in front of more than 64,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - tens of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
To ensure your ad reaches this influential audience, reserve your space now by emailing damienb@theaiedge.io.
The backpropagation algorithm is currently the most commonly used and effective method for training neural networks.
The main principles of backpropagation are:
1. **Forward Propagation**: Data flows from the input layer through the hidden layers to the output layer.
2. **Error Calculation and Propagation**: The error between the model's output and the actual result is calculated and then propagated backward. This means the error is transmitted from the output layer back through the hidden layers to the input layer.
3. **Iteration**: During backpropagation, the model's parameters are continuously adjusted based on the error. This process iterates through the first two steps until the model meets the training termination criteria.
The two most critical steps are: (1) propagating the error backward, and (2) continuously adjusting the model's parameters based on the error.
These steps are collectively known as optimization methods, with gradient descent being commonly used.