Course Progress:

9.0 Overview – Techniques for Speeding Up Model Training

This unit covers various techniques to accelerate deep learning training.

Mixed-precision training

We cover mixed-precision training, a method that uses both 16-bit and 32-bit floating-point types to reduce memory usage and increase training speed, particularly on modern GPUs that have specialized hardware for 16-bit calculations.

Multi-GPU training

We also delve into strategies for multi-GPU training, including data parallelism and model parallelism, where the former distributes different mini-batches of data across multiple GPUs and the latter splits a single model across several GPUs.

Other performance tips

Furthermore, we will discuss how to compile models using the new torch.compile feature in PyTorch 2.0, which can result in optimized models that give us additional speed-ups for both training and inference.

Also, we’ll discuss the relationship between batch size and training throughput, explaining how larger batch sizes can increase computational efficiency and thus speed up training, but also noting potential drawbacks like the risk of running out of memory and the potential for worse model performance due to fewer updates per epoch and a larger generalization gap.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

9.0 Overview – Techniques for Speeding Up Model Training

Watch Video 1 Mark complete and go to Unit 10 →

Videos

Follow along in a Lightning Studio

DL Fundamentals 9: Speeding Up Training