lr_scheduler.OneCycleLR "ValueError: Tried to step X+2 times. The specified number of total steps is X."

mikful · October 15, 2020, 7:10pm

Hi, thanks for your response.

I’m assigning self.steps_per_epoch by passing len(data.train_dataloader()) which is created using a data module. I’ve checked this and it always gives the correct length such that epochs * steps_per_epoch = total_steps without the extra two steps in the error.

Also, I’ve tried what you suggested with trainer = Trainer(num_sanity_val_steps=0), alas no change. It seems somewhere in the training loop an extra 2 steps are being called as per the problem in the PyTorch forums thread… with the OP there they were training twice, but I’m not.

There’s a suggestion in that thread:

For debugging purposes you could add a counter and print its accumulated value for each scheduler.step() call.

How would I do this using Lightning?