Back to index
# P/N: DOC-20260322-717
PUBLISHED REV 1.0

Optimization Algorithms in Deep Learning

An overview of optimization techniques from standard SGD to Adam, and their mathematical formulations.

Date
2026-03-22
Category
AI
Read Time
1 min
Words
147
Tags
Machine Learning Mathematics Python

Optimization Algorithms in Deep Learning

Optimization is the core of training any neural network. We aim to minimize the loss function J(θ)J(\theta).

Gradient Descent

The standard parameter update step:

θ=θηθJ(θ;x(i),y(i))\theta = \theta - \eta \nabla_\theta J(\theta; x^{(i)}, y^{(i)})

Adam Optimizer

Adam (Adaptive Moment Estimation) combines the advantages of AdaGrad and RMSProp.

Let mtm_t be the first moment (mean) and vtv_t the second moment (uncentered variance):

mt=β1mt1+(1β1)gtm_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t vt=β2vt1+(1β2)gt2v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2

To correct the bias towards zero:

m^t=mt1β1t\hat{m}_t = \frac{m_t}{1 - \beta_1^t} v^t=vt1β2t\hat{v}_t = \frac{v_t}{1 - \beta_2^t}

Finally, update the parameters:

θt+1=θtηv^t+ϵm^t\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{\hat{v}_t} + \epsilon} \hat{m}_t

import torch

# Standard optimization block in PyTorch
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

optimizer.zero_grad()
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()
── END OF DOCUMENT ──

© 2026 Lapinex 技术枢纽. 纯静态网页架构.

AIoT • 网络安全 • 边缘计算 • 人工智能 • 全栈开发