Introducing Lit-GPT: Hackable implementation of open-source large language models released under Apache 2.0 →

# 3.3 Model Training with Stochastic Gradient Descent (Part 1-4)

What we covered in this video lecture

This lecture introduced the training algorithm behind logistic regression: stochastic gradient descent. This is the same training algorithm we use for training deep neural networks.

Stochastic gradient descent is based on calculus: we compute the loss function’s derivatives (or gradients) with respect to the model weights. Why? The loss measures “how wrong” the predictions are. And the gradient tells us how we have to change the weights to minimize (improve) the loss.

The loss is correlated to the accuracy, but sadly, we cannot optimize the accuracy directly using stochastic gradient descent. That’s because accuracy is not a smooth function.

Computing the loss gradients is based on the chain rule from calculus, and if you are not familiar with it, it may look daunting at first. But do not worry. We will introduce PyTorch functions that can handle the differentiation (that is, the calculation of the gradients) automatically for us. This is known as automatic differentiation or autograd.

The following lecture introduces PyTorch functionality that calculates the gradients automatically for us. However, if you are new to calculus or need a refresher and you want to learn more (not required for this course), I have written a concise calculus primer that you might find helpful: Calculus and Differentiation Primer.

Moreover, if you are interested in an alternative introduction to stochastic gradient descent, you may find my article Single-Layer Neural Networks and Gradient Descent helpful.

• Quizzes
• Progress tracking
• Notifications when new units are released
• Free cloud computing credits

Unit 3.3