@sm000 this is the default behavior in torch.nn.parallel, which Lightning wraps. I believe this is the default behavior so that one can increase/decrease the number of gpus without having to worry about changing hyperparameters (as learning rate should ideally be changed inversely to batch-size).
A possible feature could be have some sort of effective_learning_rate
or effective_batch_size
.