Gradient Accumulation with Dual (optimizer, scheduler) Training

celsofranssa · November 10, 2022, 10:28am

Hello, Lightning community!

I am using a dual (optimizer, scheduler) training as shown in the code snippet below:

def configure_optimizers(self):
    [...]
    return (
        {"optimizer": optimizer_1,
         "lr_scheduler": {"scheduler": scheduler_1, "interval": "step", "name": "scheduler_1"},
         "frequency": 1},
        {"optimizer": optimizer_2,
         "lr_scheduler": {"scheduler": scheduler_2, "interval": "step", "name": "scheduler_2"},
         "frequency": 1},
    )

With "frequency": 1 on both optimizers, the trainer calls optimizer_1 in step i while calling optimizer_2 in step (i+1).

Therefore, is there an approach to combine gradient acccumulation with this optimization setup where optimizer_1 uses the accumulated gradient from steps (i-1) and i while optimizer_2 uses the accumulated gradient from steps i and (i+ 1)?

Topic		Replies	Views
Jointly update for multiple optimizers(schedulers) LightningModule	0	821	June 16, 2022
Wonder if _update_learning_rates is properly implemented Trainer	0	175	April 19, 2023
`self.lr_schedulers().optimizer` and `self.optimizers()` return different optimizers after resuming training LightningModule	0	219	July 18, 2023
Implement SCHEDULER OPTIMIZER in Pytorch Lightning implementation help	0	749	August 28, 2022
Changing the Optimizer and lr_scheduler with a callback callbacks	1	775	March 8, 2024

Gradient Accumulation with Dual (optimizer, scheduler) Training

Related topics