Reproduce one GPU score/loss using DDP - Disrepancy

paxandfidem · January 23, 2024, 7:31am

Hi,
On DDP the documentation states that its recommended that we multiply the batch size by the number of gpus, but on pytorch lightning 2.1. 3 using ddp, it automatically infers and multiplies the batch size to the number of gpus. Is this a bug, or it was intended, if it is intentional please direct me to the latest documentation. My goal is to have a baseline where, if i train a model using one gpu or more gpus using ddp, i should relatively get similar results. Also please advise how to handle batch size and learning rate when switching from one gpu to ddp
Please help

awaelchli · January 28, 2024, 10:12pm

Hey @paxandfidem
That’s correct, when you use DDP then the batch size you set in the dataloader is always local to the GPU of one process. The total global batch size would be N * batch_size and so in that sense it scales automatically.

My goal is to have a baseline where, if i train a model using one gpu or more gpus using ddp, i should relatively get similar results.

If you want to do this, then do two experiments. First, choose the number B. Then:

Train with single GPU with batch size B.
Train with N GPUs with batch size B/N

(Make sure that B is divisible by N of course). This way, the global batch size of both experiments is the same. You should get approximately the same loss values. The only reason why it wouldn’t be exact is because samples get allocated to the batches in a different order.

I can improve the text in the docs if you can point me to it.

Topic		Replies	Views
Effective learning rate and batch size with Lightning in DDP DDP/GPU	19	13311	October 9, 2020
Use DDP to train a single model, on a single GPU, multiple processes	0	196	May 15, 2024
DDP MultiGPU Training does not reduce training time DDP/GPU	3	1728	November 8, 2023
DistributedDataParallel multi GPU barely faster than single GPU DDP/GPU	2	1662	March 10, 2023
Multi-gpu training is much lower than single gpu (due to additional processes?) DDP/GPU	0	299	May 8, 2024

Reproduce one GPU score/loss using DDP - Disrepancy

Related topics