I’m not seeing any speed increase when increasing the number of GPUs (from 1 to 8) and switching Lightning’s distributed backend to DDP, even sometimes getting slower. Any ideas why this might be the case in general? I have num_workers in my DataModules/dataloaders set to 32 and pin_memory True.
Anything I can do in Lightning to diagnose/fix this? (I’m aware of the profiler but not sure how I can make it helpful here.)