Effective learning rate and batch size with Lightning in DDP

teddy · August 30, 2020, 9:44pm

@sm000 this is the default behavior in torch.nn.parallel, which Lightning wraps. I believe this is the default behavior so that one can increase/decrease the number of gpus without having to worry about changing hyperparameters (as learning rate should ideally be changed inversely to batch-size).

A possible feature could be have some sort of effective_learning_rate or effective_batch_size.