When the interval is “step”, the lr scheduler updates the learning rate based on batch_idx. Shouldn’t this part be updated based on global_step? If gradient accumulation takes place at a specific batch_idx, lt seems that lr_scheduler is expected to update the learning rate in the wrong way.