Saving checkpoints and logging models

shazalvi · April 26, 2023, 8:39am

Hi everyone,

I have a very newbie question as I am fairly new to the development of lightning. I am finding it difficult to understand the difference between saving checkpoints and logging models.

I understand from the documentation (here) that it is used to save the state of the model at a time. But then what is log_model doing then?

Any help would be appreciated. Thanks.

awaelchli · April 28, 2023, 6:15pm

“Logging models” is a feature of the loggers and it depends a bit of the 3rd-party library what that means. For example, in the tensorboard logger this means that it traces the graph and displays the network so it can be inspected.

Saving a checkpoint is something entirely different. There, the goal is to stave as much of the state of the training into a file so that one can resume it later on, or load the state at the end of training to evaluate / run inference. A checkpoint typically includes the model paramters, hyperparameters, optimizer state and other data.

Topic		Replies	Views
What does PyTorch Lightning module do with logged validation losses?	10	3092	March 6, 2024
How to save model checkpoints every 1000 batches of data during training LightningModule	2	3479	January 23, 2021
Saving/Loading the Model for Inference Later	3	2268	January 21, 2021
Different behavior for model checkpoints if last or best implementation help	0	217	July 25, 2023
Logging stops when adding ModelCheckpoint callback to trainer callbacks	1	1514	January 8, 2021

Saving checkpoints and logging models

Related topics