I’m curious how others deal correctly using pl.seed_everything() in ddp mode. By doing this, transforms are reapplied the exact same way across samples in the same batch. This is a pretty large reduction in randomness. The alternative is to not seed or seed per RANK, but you can run into some pretty bad bugs when doing things like randomly splitting, and it makes reproducibility harder. Changing the seed over and over again during training is also not a solution as this is very bad for randomness as well.
Im not sure the correct way of handling this and would love to hear others thoughts.
I have the same question! I don’t think we should use the same seed across all machines either, since that will indeed make data augmentations have the same random transformations. But I do use a fixed seed for each machine which is determined by their rank. Also, I think underneath DDP, upon instantiation, all models from non-master processes will copy over the weights from the master process, so I’m not sure why the official introductory video on multi-gpu says “models will be different”.
Can someone shed some light on this? Am I missing something?
Hello guys, I’ve come here looking for an answer to the same question.
Did you finally got a solution? Do I need to set the seed globally when training multigpu (single node)? And also, as you said, I would prefer not to in order to avoid same seed among Transforms workers.