9.0 Overview – Techniques for Speeding Up Model Training
This unit covers various techniques to accelerate deep learning training.
We cover mixed-precision training, a method that uses both 16-bit and 32-bit floating-point types to reduce memory usage and increase training speed, particularly on modern GPUs that have specialized hardware for 16-bit calculations.
We also delve into strategies for multi-GPU training, including data parallelism and model parallelism, where the former distributes different mini-batches of data across multiple GPUs and the latter splits a single model across several GPUs.
Other performance tips
Furthermore, we will discuss how to compile models using the new
torch.compile feature in PyTorch 2.0, which can result in optimized models that give us additional speed-ups for both training and inference.
Also, we’ll discuss the relationship between batch size and training throughput, explaining how larger batch sizes can increase computational efficiency and thus speed up training, but also noting potential drawbacks like the risk of running out of memory and the potential for worse model performance due to fewer updates per epoch and a larger generalization gap.
Log in or create a free Lightning.ai account to access:
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits