Hi, thanks for your response.
I’m assigning self.steps_per_epoch
by passing len(data.train_dataloader())
which is created using a data module. I’ve checked this and it always gives the correct length such that epochs * steps_per_epoch = total_steps
without the extra two steps in the error.
Also, I’ve tried what you suggested with trainer = Trainer(num_sanity_val_steps=0)
, alas no change. It seems somewhere in the training loop an extra 2 steps are being called as per the problem in the PyTorch forums thread… with the OP there they were training twice, but I’m not.
There’s a suggestion in that thread:
For debugging purposes you could add a counter and print its accumulated value for each
scheduler.step()
call.
How would I do this using Lightning?