Deep Learning Fundamentals

Pages

Deep Learning Fundamentals

Deep Learning Fundamentals > Unit 6 > Unit 6.2

Course Progress:

Unit 6.2 – Learning Rates and Learning Rate Schedulers

Slides

Part 1: Finding a Good Learning Rate
Part 2: Finding a Good Learning Rate
Part 4: Annealing the Learning Rate with a Scheduler

References

Tuner documentation for learning rate finding
configure_optimizers dictionary documentation
StepLR documentation
ReduceLROnPlateau documentation
CosineAnnealingLR documentation
CosineAnnealingWarmRestarts documentation

Code

Parts 1, 2 & 4: 6.2-learning-rates/

What we covered in this video lecture

In this lecture, we introduced three different kinds of learning rate schedulers: step schedulers, on-plateau schedulers, and cosine decay schedulers. They all have in common that they decay the learning rate over time to achieve better annealing — making the loss less jittery or jumpy towards the end of the training.

In practice, I often recommend starting without a learning rate scheduler and then adding a learning rate scheduler while making sure that the predictive performance is better than before — if the predictive performance becomes worse than without a scheduler, that’s usually an indicator that the scheduler’s hyperparameters need to be adjusted.

Additional resources if you want to learn more

If you are interested in additional analyses about learning rate scheduling, you might like the classic Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates paper. The paper discusses a phenomenon called “super-convergence” where neural networks can be trained much faster than with standard methods, leading to better generalization. Super-convergence is achieved through training with one learning rate cycle and a large maximum learning rate, which regularizes the training and requires a reduction in other forms of regularization. The authors also propose a simplified method to estimate the optimal learning rate. The experiments demonstrate the effectiveness of super-convergence on several datasets and architectures, especially when the amount of labeled training data is limited.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

Quiz: 6.2 Learning Rates and Learning Rate Schedulers - Part 1

If the learning rate is too small, the loss will

not improve

Correct. Usually, the loss will stay more or less constant if the learning rate is too small.

fluctuate

Incorrect. Usually, the loss will stay more or less constant if the learning rate is too small.

Please answer all questions to proceed.

Quiz: 6.2 Learning Rates and Learning Rate Schedulers - Part 2

Suppose we want to create a model checkpoint based on the training set loss. What would the correct ModelCheckpoint code look like in this case?

ModelCheckPoint(save_top_k=1, mode=”max”, monitor=”train_acc”, save_last=True)

Incorrect. The code shown above monitors the accuracy, not the loss.

ModelCheckPoint(save_top_k=1, mode=”min”, monitor=”train_acc”, save_last=True)

Incorrect. The code shown above monitors the accuracy, not the loss.

ModelCheckPoint(save_top_k=1, mode=”max”, monitor=”train_loss”, save_last=True)

Incorrect. The code would maximize the training set loss.

ModelCheckPoint(save_top_k=1, mode=”min”, monitor=”train_loss”, save_last=True)

Correct. The code minimizes the training set loss.

Please answer all questions to proceed.

Quiz: 6.2 Learning Rates and Learning Rate Schedulers - Part 3

The automatic learning rate finding function will always find the optimal learning rate.

True

Incorrect. Automatic learning rate finders use a heuristic to get a good ballpark estimate, but they are not guaranteed to return the optimal learning rate.

False

Correct. Automatic learning rate finders use a heuristic to get a good ballpark estimate, but they are not guaranteed to return the optimal learning rate.

Please answer all questions to proceed.

Quiz: 6.2 Learning Rates and Learning Rate Schedulers - Part 4

Setting step_size=5 and gamma=0.3 in the step scheduler will

Decrease the learning rate every 5 minibatches by a factor of 3

Incorrect. The step_size, by default, refers to the epochs (although it is possible to reconfigure it)

Decrease the learning rate every 5 epochs by a factor of 0.3

Correct. We multiply the learning rate by 0.3 every 5 epochs.

Increase the learning rate every 5 minibatches by a factor of 3

Incorrect. The step_size, by default, refers to the epochs (although it is possible to reconfigure it)

Increase the learning rate every 5 epochs by a factor of 0.3

Incorrect. We decrease, not increase the learning rate.

Please answer all questions to proceed.

Quiz: 6.2 Learning Rates and Learning Rate Schedulers - Part 5

CosineAnnealingLR T_max argument …

Restarts the learning rate after the specified number of steps.

Correct. If T_max is reached, the learning rate decayed to the lowest point and will be reset.

Will stop decaying the learning rate after the specified number of steps.

Correct. If T_max is reached, the learning rate decayed to the lowest point and will be reset.

Will not decrease the learning rate to be smaller than the T_max value.

Incorrect. T_max refers to the step number and not a specific learning rate value.

Will not increase the learning rate to be smaller than the T_max value.

Incorrect. T_max refers to the step number and not a specific learning rate value.

Please answer all questions to proceed.

Watch Video 1 Mark complete and go to Unit 6.3 →

Unit 6.2

Videos

Follow along in a Lightning Studio

DL Fundamentals 6: DL Tips & Tricks

Sebastian

Launch Studio →