Calculating epoch_level metrics for checkpointing

gogothorr · September 1, 2020, 9:30am

Hello,

I am trying to calculate a metric on the entire validation set. The metric can not be computed on batches and can not be approximated from per_batch results. Ideally I would like to write checkpoints depending on this epoch level metric.

So far, I have found two options:

A) Add a Callback that runs a fn with a pass over the validation set to calculate the metric. I know that it is possible to log the metric to loggers from here (e.g. Tensorboard). But how would I add it to the Checkpoints-Callback? Is it possible to write to the Evalresult obj from other callbacks?

B) Use on_validation_epoch_end(). The same question as in A) applies.

What is the best way to implement this?

Many thanks for your help!

dilip · September 1, 2020, 4:19pm

Would computing the metric in on_validation_epoch_end, then returning it in the dictionary output of that method? If so, in addition to setting the name of the monitor in the checkpoint callback to whatever the key is for that metric, I think that would work.

goku · September 1, 2020, 6:02pm

check the 2nd example here. You can access all prediction in epoch_end and compute the metric there.

Topic		Replies	Views
Reported validation metrics do not match the actual validation metrics implementation help	0	416	August 31, 2021
How do I get the metric in on_validation_epoch_end()? LightningModule	2	1536	July 10, 2023
Saving model checkpoint during the epoch callbacks	1	2169	December 31, 2020
Log PyTorch Lightning metric over full validation data loader (for the full epoch) Trainer	2	966	August 26, 2020
Computing validation accuracy at the end of each epoch implementation help	1	4292	September 18, 2020

Calculating epoch_level metrics for checkpointing

Related topics