ReduceLROnPlateau conditioned on metric

Hi all,
Thanks a lot for this great tool!

I run into this error, I don’t understand about the available metrics, why are those things?

pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_dice which is not available. Available metrics are: val_early_stop_on,val_checkpoint_on,checkpoint_on. Condition can be set using `monitor` key in lr scheduler dict

And this is my scheduler dict:

        lr_dict = {
                'scheduler': ReduceLROnPlateau(optimizer=optimizer, mode='max', factor=0.5,
                                               patience=10, min_lr=1e-6),
                # might need to change here
                'monitor': 'val_dice',  # Default: val_loss
                'reduce_on_plateau': False,  # For ReduceLROnPlateau scheduler, default
                # 'interval': 'step',
                'interval': 'epoch',
                # need to change here
                # 'frequency': 300
                'frequency': 1
            }

And this is my pl.EvalResult:

        result = pl.EvalResult(early_stop_on=dice, checkpoint_on=dice)
        result.log('val_loss', loss_cuda, on_step=False, on_epoch=True, logger=True, prog_bar=False,
                   reduce_fx=torch.mean, sync_dist=True)
        result.log('val_dice', dice, on_step=False, on_epoch=True, logger=True, prog_bar=True,
                   reduce_fx=torch.mean, sync_dist=True)
        result.log('val_IoU', iou, on_step=False, on_epoch=True, logger=True, prog_bar=False,
                   reduce_fx=torch.mean, sync_dist=True)
        result.log('val_sensitivity', sensitivity, on_step=False, on_epoch=True, logger=True, prog_bar=False,
                   reduce_fx=torch.mean, sync_dist=True)
        result.log('val_specificity', specificity, on_step=False, on_epoch=True, logger=True, prog_bar=False,
                   reduce_fx=torch.mean, sync_dist=True)
        return result

And this is my callback in Trainer:

    checkpoint_callback = ModelCheckpoint(
        filepath=checkpoint_file,
        save_top_k=3,
        verbose=True,
        # monitor='val_dice',
        mode='max',
        prefix=''
    )

    early_stop_callback = EarlyStopping(
        # monitor='val_loss',
        min_delta=0.00,
        patience=300,
        strict=True,
        verbose=False,
        mode='max'
    )

Am I doing anything wrong or misunderstanding anything?

Thanks a lot!

I set the 'monitor' = 'val_checkpoint_on' in scheduler dict, then I don’t run into this error. What is happening when I set this? Is the scheduler monitor on the checkpoint_on I set, right now?

Since you’re using the result class, whatever metric you set as early stopping gets automatically associated with the key val_checkpoint_on here.

1 Like