Save checkpoints after specific number of steps instead of epochs

sujay_khandekar · September 26, 2020, 11:00pm

my trainer looks like this

trainer = pl.Trainer(gpus=gpus,max_steps=25000,precision=16)
trainer.fit(model,train_dl)

I want to save model checkpoint after each 5000 steps (they can overwrite). Is it possible to do that?
According to documentation checkpoint can be saved using modelcheckpoint callback after specific number of epochs, but I didn’t see anything mentioned there about saving after specific number of steps. I am not passing any val data , so I do not want to save based on val loss values either.
Is there any way to do this?
Thanks.

goku · September 27, 2020, 8:37am

you can try this: Save checkpoint and validate every n steps · Issue #2534 · Lightning-AI/lightning · GitHub

sujay_khandekar · September 28, 2020, 3:52am

Thanks that worked for me

Topic		Replies	Views
Saving model checkpoint during training epoch	0	6	October 27, 2024
Run Validation and Checkpoint every n steps implementation help	0	245	April 5, 2024
Saving model checkpoint during the epoch callbacks	1	2163	December 31, 2020
Checkpoints are overwritten automatically callbacks	1	1463	February 7, 2022
Save checkpoint without overwrite callbacks	1	578	January 29, 2022

Save checkpoints after specific number of steps instead of epochs

Related topics