I am using LightningCLI to handle the arguments parsing from the CLI and each time I start training the model, a folder called version_x, where x is the number of the current run/experiment, is created under the directory lightning_logs.
Now I’d like to checkpoint the model under the current version_x directory, is there any way to achieve this from the config.yaml file?
Currently, I have this in my config.yaml:
trainer:
callbacks:
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
dirpath: lightning_logs/version_{version}/checkpoints
However, this does not save the checkpoints under the current version_x folder but creates a new folder called version_{version}.
Any suggestions?
@angelonazzaro Not 100% sure but I think what you want to do can be achieved is by specifying the version number in the logger explicitly. For example, without the CLI it would look something like this:
logger = TensorBoardLogger(..., version=5)
trainer = Trainer(logger=logger, ...)
Then, the checkpoint callback will save its checkpoints into the log folder at that version. You should be able to derive the corresponding config.yaml from this, it will look something like this (not tested):
trainer:
loggers:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
version: 5
Let me know if that helps.
I found out that if you leave the dirtpath argument to null when instantiating the ModelCheckpoint callback, the checkpoints will be automatically saved to trainer’s default_root_dir argument whichs does what I was trying to accomplish.
I spent hours trying to do that and all it took was a 5 seconds read inside the documentation hahha.
Thanks for you help!