Optimization Algorithms in Deep Learning
Optimization is the core of training any neural network. We aim to minimize the loss function .
Gradient Descent
The standard parameter update step:
Adam Optimizer
Adam (Adaptive Moment Estimation) combines the advantages of AdaGrad and RMSProp.
Let be the first moment (mean) and the second moment (uncentered variance):
To correct the bias towards zero:
Finally, update the parameters:
import torch
# Standard optimization block in PyTorch
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
optimizer.zero_grad()
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()