I am using this checkpointing callback to take care of checkpointing the best model at the end of training:
checkpoint_callback = pl.callbacks.ModelCheckpoint(
dirpath='/content/lightning_logs',
filename='{epoch}-{val_loss:.3f}-{val_f1:.3f}',
monitor="val_f1",
mode='max',
every_n_train_steps=1,
save_top_k=1
)
At the end of training, I get this file name for my checkpoint: “epoch=44-val_loss=0.684-val_f1=0.818.ckpt”. Tensorboard also reports this f1 score as being the best.
However, I cannot reproduce the results. When I use the checkpoint to create a model for testing, I get lower results. For example, in this scenario, running:
trainer.validate(ckpt_path='/content/lightning_logs/epoch=44-val_loss=0.684-val_f1=0.818.ckpt')
gives me a validation f1 score of 0.802 (not 0.818).
What could the culprit be?