GPU training (FAQ)¶

How should I adjust the learning rate when using multiple devices?¶

When using distributed training make sure to modify your learning rate according to your effective batch size.

Let’s say you have a batch size of 7 in your dataloader.

class LitModel(LightningModule):
    def train_dataloader(self):
        return Dataset(..., batch_size=7)

In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.

# effective batch size = 7 * 8
Trainer(accelerator="gpu", devices=8, strategy="ddp")
Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, strategy="horovod")

# effective batch size = 7 * 8 * 10
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")

Note

Huge batch sizes are actually really bad for convergence. Check out: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

In DP, which does not support multi-node, the effective batch size will be just 7, regardless of how many devices are being used. The reason is that the full batch gets split evenly between all devices.

# effective batch size = 7, each GPU sees a batch size of 1 except the last GPU
Trainer(accelerator="gpu", devices=8, strategy="dp")

# effective batch size = 7, first GPU sees a batch size of 4, the other sees batch size 3
Trainer(accelerator="gpu", devices=2, num_nodes=10, strategy="dp")

How do I use multiple GPUs on Jupyter or Colab notebooks?¶

To use multiple GPUs on notebooks, use the DP mode.

Trainer(accelerator="gpu", devices=4, strategy="dp")

If you want to use other models, please launch your training via the command-shell.

Note

Learn how to access a cloud machine with multiple GPUs in this guide.

GPU training (FAQ)¶

How should I adjust the learning rate when using multiple devices?¶

How do I use multiple GPUs on Jupyter or Colab notebooks?¶

I’m getting errors related to Pickling. What do I do?¶