Hello, Lightning community!
I am using a dual (optimizer, scheduler) training as shown in the code snippet below:
def configure_optimizers(self):
[...]
return (
{"optimizer": optimizer_1,
"lr_scheduler": {"scheduler": scheduler_1, "interval": "step", "name": "scheduler_1"},
"frequency": 1},
{"optimizer": optimizer_2,
"lr_scheduler": {"scheduler": scheduler_2, "interval": "step", "name": "scheduler_2"},
"frequency": 1},
)
With "frequency": 1
on both optimizers, the trainer calls optimizer_1
in step i
while calling optimizer_2 in step (i+1)
.
Therefore, is there an approach to combine gradient acccumulation
with this optimization setup where optimizer_1
uses the accumulated gradient from steps (i-1)
and i
while optimizer_2
uses the accumulated gradient from steps i
and (i+ 1)
?