my trainer looks like this
trainer = pl.Trainer(gpus=gpus,max_steps=25000,precision=16)
trainer.fit(model,train_dl)
I want to save model checkpoint after each 5000 steps (they can overwrite). Is it possible to do that?
According to documentation checkpoint can be saved using modelcheckpoint callback after specific number of epochs, but I didn’t see anything mentioned there about saving after specific number of steps. I am not passing any val data , so I do not want to save based on val loss values either.
Is there any way to do this?
Thanks.