Effective learning rate and batch size with Lightning in DDP

@sm000 this is the default behavior in torch.nn.parallel, which Lightning wraps. I believe this is the default behavior so that one can increase/decrease the number of gpus without having to worry about changing hyperparameters (as learning rate should ideally be changed inversely to batch-size).

A possible feature could be have some sort of effective_learning_rate or effective_batch_size.

1 Like