Is gradient clipping done before or after gradients accumulation?

Hey @zlenyk

The Trainer does gradient clipping right before the optimizer step. So that means after accumulation of gradients. Here is the relevant code: lightning/precision_plugin.py at 1d1f6009630d01f5347a7234dad97f6c75f93af0 · Lightning-AI/lightning · GitHub

This code gets called when the training loop calls precision_plugin.optimizer_step() etc.

You can have the other behavior as well (clipping before during accumulation) by enabling manual optimization and performing the accumulation yourself: Manual Optimization — PyTorch Lightning 2.1.0dev documentation

Or alternatively, for even more control there is Lightning Fabric (write the training loops completely yourself).

1 Like