Difference between the checkpoint val_cer and real val_cer on the validation set

sara · November 15, 2022, 11:26am

I am using stratedy=ddp and saving checkpoints by using the appropriate class named ModelCheckpoint which I pass to the trainer with the argument callbacks.

The checkpoint saved is for example epoch=204-val_cer=0.0459-val_loss=1.1404.ckpt as I set to put there the data of validation cer and validation loss. However, when I run the checkpoint with the validation set the validation cer is different (using the same batch size and number of gpus as what was used during training).

If I see the tensorboard logs they agree with the validation cer displayed in the checkpoint name.
The cer is implemented through the torchmetrics.CharErrorRate() implementation.

So, what am I doing wrong?

Here, I report the code for the checkpoint:

checkpoint_callback_val_cer = ModelCheckpoint(
                save_top_k=1,
                monitor="val_cer",
                filename='{epoch}-{val_cer:.4f}-{val_loss:.4f}'
            )

Topic		Replies	Views
Reported validation metrics do not match the actual validation metrics implementation help	0	414	August 31, 2021
No checkpoints are being saved implementation help	1	334	August 24, 2021
Saving checkpoint by val loss AND last checkpoint	2	3562	September 22, 2020
Saving model checkpoint during the epoch callbacks	1	2163	December 31, 2020
Unable to save optimized checkpoints (tried both pl.EvalResult and checkpoint_callback) callbacks	1	1637	February 22, 2021

Difference between the checkpoint val_cer and real val_cer on the validation set

Related topics