Use DDP to train a single model, on a single GPU, multiple processes

sassoshots · May 15, 2024, 7:04pm

I am at a loss here. I would like to maximize the number of steps my model takes within a specific time block, and so instead of filling the batch size to max out the VRAM, I want to run multiple processes for this single model and aggregate the results as if it was being on on multiple devices, but is only running on a single GPU. I have more then enough VRAM to run 3-4 processes for this specific model, potentially giving me a 3x speed boost in traversing steps. Like I said, a larger batch size is out of the question, as it only reduces the number of steps for a given time frame.

How can I do this? I saw something about using ddp, gloo backend, and setting the device twice as seen here Emulating multiple devices with a single GPU · Lightning-AI/pytorch-lightning · Discussion #8630 · GitHub but lightning complains when I add the same device twice. I really want to avoid writing anything from scatch to aggregate the gradients between processes, so out of the box solutions are preffered.

Topic		Replies	Views
Reproduce one GPU score/loss using DDP - Disrepancy DDP/GPU	1	343	January 28, 2024
Multi-gpu training is much lower than single gpu (due to additional processes?) DDP/GPU	0	229	May 8, 2024
Effective learning rate and batch size with Lightning in DDP DDP/GPU	19	12997	October 9, 2020
DistributedDataParallel multi GPU barely faster than single GPU DDP/GPU	2	1494	March 10, 2023
Multiple GPU runs the scipt twice DDP/GPU	10	344	February 8, 2024

Use DDP to train a single model, on a single GPU, multiple processes

Related topics