Effective learning rate and batch size with Lightning in DDP

A good rule of thumb with regards to batch size and learning rate can be found in “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour” (https://arxiv.org/pdf/1706.02677.pdf)

As we will show in comprehensive experiments, we
found that the following learning rate scaling rule is surprisingly effective for a broad range of minibatch sizes:
Linear Scaling Rule: When the minibatch size is
multiplied by k, multiply the learning rate by k.

1 Like