I have a very newbie question as I am fairly new to the development of lightning. I am finding it difficult to understand the difference between saving checkpoints and logging models.
I understand from the documentation (here) that it is used to save the state of the model at a time. But then what is log_model doing then?
Any help would be appreciated. Thanks.
“Logging models” is a feature of the loggers and it depends a bit of the 3rd-party library what that means. For example, in the tensorboard logger this means that it traces the graph and displays the network so it can be inspected.
Saving a checkpoint is something entirely different. There, the goal is to stave as much of the state of the training into a file so that one can resume it later on, or load the state at the end of training to evaluate / run inference. A checkpoint typically includes the model paramters, hyperparameters, optimizer state and other data.