Log in or create a free Lightning.ai account to track your progress and access additional course materials Get Started →

Deep Learning Fundamentals

Pages

Deep Learning Fundamentals

Final certification exam

Deep Learning Fundamentals > Unit 9 > Unit 9.2

Course Progress:

Unit 9.2 Multi-GPU Training Strategies

Slides

Part 1: Introduction to Multi-GPU Training
Part 2: Choosing a Multi-GPU Strategy

References

Sequence Parallelism: Long Sequence Training from [a] System[s] Perspective, https://arxiv.org/abs/2105.13120
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, https://arxiv.org/abs/1910.02054

What we covered in this video lecture

In this lecture, we explored the topic of GPU (Graphics Processing Unit) training on multiple GPUs and the inherent benefits and strategies it offers for large-scale machine learning tasks. As a critical point, GPUs, which have a much higher number of cores compared to CPUs, are well-suited for parallel computations, making them ideal for training machine learning models.

We then delved into the various categories of parallelism that harness the power of multiple GPUs: data parallelism, model parallelism, pipeline parallelism, tensor parallelism, and sequence parallelism.

For example, data parallelism involves distributing different subsets of the training data across multiple GPUs and then aggregating the gradients for the model update. Model parallelism splits the model itself across GPUs, where each GPU computes a part of the forward and backward pass. Tensor parallelism, on the other hand, is a more recent approach that splits the model’s tensors across multiple GPUs to handle extremely large models that don’t fit into a single GPU memory.

These techniques, in tandem or isolation, allow for the optimization of computational resources, speed up the training process, and enable the handling of larger models and datasets, thereby making multi-GPU training a key aspect of modern machine learning infrastructure.

Additional resources if you want to learn more

There are several guides on the PyTorch Lightning documentation website that I highly recommend reading for more advanced usages:

GPU Training (Basic)
GPU Training (Intermediate)
GPU Training (Advanced)
GPU Trainining FAQ

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

Quiz: 9.2 Multi-GPU Training Strategies (Part 1)

Which muti-gpu strategy is recommended when using “accelerator=mps” on Apple devices instead of “accelerator=gpu”?

-1" name="question-1" value="

strategy="ddp"

-1">

strategy="ddp"

Incorrect. Hint: Apple Silicon computers currently don’t have multiple GPUs, so there is no strategy to select for multi-GPU training when using Apple devices at the moment.

strategy=None

Correct. This was a trick question, because Apple Silicon computers currently don’t have multiple GPUs, so there is no strategy to select for multi-GPU training when using Apple devices at the moment.

-3" name="question-1" value="

strategy="ddp_spawn"

-3">

strategy="ddp_spawn"

Incorrect. Hint: Apple Silicon computers currently don’t have multiple GPUs, so there is no strategy to select for multi-GPU training when using Apple devices at the moment.

-4" name="question-1" value="

strategy="ddp_notebook"

-4">

strategy="ddp_notebook"

Incorrect. Hint: Apple Silicon computers currently don’t have multiple GPUs, so there is no strategy to select for multi-GPU training when using Apple devices at the moment.

Please answer all questions to proceed.