Effective learning rate and batch size with Lightning in DDP

goku · September 1, 2020, 7:26pm

So in case of DDP one should set it to lr (specific to per-gpu batch_size) but in case of DP it should be set to lr*N since backward is done on single gpu right??

And same in case of TPU(8 cores training) as that of DDP since it’s basically DDP?