Hi,
Why in fabric ddp (and fsdp) the num_replica
for data loaders is set to be
num_nodes * num_processes
Instead of simply world_size (like in deepspeed)?
I am running on a cluster where I do not get the same number of GPUs on all nodes for a specific job. This causes fabric.setup_dataloaders
to fail.