If I am training a resent-50 model for ImageNet using DDP, what specs from the hardware do I need to know in order to determine whether I have exhausted the full compute power of my platform? In other words, suppose I use 8 RTX 3090 GPUs on a single node with DDP and achieve a 7 minutes/epoch training speed, what I should check before adding more GPUs in order to further shorten the training time? Things I have in mind are number of cpu workers per process, GPU bandwidth, etc. Anything else on top of this list?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How do I know I have fully utilized my gpus? | 0 | 586 | July 25, 2022 | |
DDP MultiGPU Training does not reduce training time | 3 | 1608 | November 8, 2023 | |
Odd Performance Using Multi-GPU + Azure
|
0 | 814 | February 13, 2022 | |
CUDA out of memory error for tensorized network | 1 | 2434 | June 10, 2021 | |
DistributedDataParallel multi GPU barely faster than single GPU | 2 | 1501 | March 10, 2023 |