Lightning AI Studios: Never set up a local environment again →

Log in or create a free account to track your progress and access additional course materials  

9.0 Overview – Techniques for Speeding Up Model Training

Start Unit 9

This unit covers various techniques to accelerate deep learning training.

Mixed-precision training

We cover mixed-precision training, a method that uses both 16-bit and 32-bit floating-point types to reduce memory usage and increase training speed, particularly on modern GPUs that have specialized hardware for 16-bit calculations.

Multi-GPU training

We also delve into strategies for multi-GPU training, including data parallelism and model parallelism, where the former distributes different mini-batches of data across multiple GPUs and the latter splits a single model across several GPUs.

Other performance tips

Furthermore, we will discuss how to compile models using the new torch.compile feature in PyTorch 2.0, which can result in optimized models that give us additional speed-ups for both training and inference.

Also, we’ll discuss the relationship between batch size and training throughput, explaining how larger batch sizes can increase computational efficiency and thus speed up training, but also noting potential drawbacks like the risk of running out of memory and the potential for worse model performance due to fewer updates per epoch and a larger generalization gap.

Log in or create a free account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits
Watch Video 1

Unit 9