Performance Modeling of Distributed Training

rexxy-sasori · September 12, 2022, 9:57am

If I am training a resent-50 model for ImageNet using DDP, what specs from the hardware do I need to know in order to determine whether I have exhausted the full compute power of my platform? In other words, suppose I use 8 RTX 3090 GPUs on a single node with DDP and achieve a 7 minutes/epoch training speed, what I should check before adding more GPUs in order to further shorten the training time? Things I have in mind are number of cpu workers per process, GPU bandwidth, etc. Anything else on top of this list?

Topic		Replies	Views
How do I know I have fully utilized my gpus? DDP/GPU	0	586	July 25, 2022
DDP MultiGPU Training does not reduce training time DDP/GPU	3	1608	November 8, 2023
Odd Performance Using Multi-GPU + Azure	0	814	February 13, 2022
CUDA out of memory error for tensorized network DDP/GPU	1	2434	June 10, 2021
DistributedDataParallel multi GPU barely faster than single GPU DDP/GPU	2	1501	March 10, 2023

Performance Modeling of Distributed Training

Related topics