Does lightning supports multi-node settings?

amirpouranbenveyseh2 · August 26, 2023, 12:13am

On the pytorch lightning documentation it is mentioned that lightning automatically handle multi-node training. However, I run the same script on a single-node 2-gpu machine and multi-node 4-gpu cluster and I see the training on single-machine is 2X faster than multi-node cluster. Specifically, I see differences in the number of steps in each epoch in each environment. In particular, for the same dataset, the single-node 2-gpu machine has 6616 training steps but the multi-node 4-gpu cluster has 13232 steps.

Topic		Replies	Views
Multi-gpu training is much lower than single gpu (due to additional processes?) DDP/GPU	0	231	May 8, 2024
Ddp2 in multi node and multi gpu failing on pytorch lightning	0	535	November 7, 2021
DDP MultiGPU Training does not reduce training time DDP/GPU	3	1606	November 8, 2023
How to train PyTorch on multiple GPUs DDP/GPU	1	596	August 27, 2020
Multiple GPU runs the scipt twice DDP/GPU	10	346	February 8, 2024

Does lightning supports multi-node settings?

Related topics