However, I am getting this message and my weights are not saved.
RuntimeWarning: Can save best model only with val_acc available, skipping.
Looking into the method on_validation_end in the class ModelCheckpoint (i.e., code), it seems I have to save ‘val_acc’ into the callback metrics of the EvalResult object, but I am not sure how I can do it and if this is the right way to do it.
the way to go would be not to change the monitor argument in your callback, but as @ydcjeff suggested to use checkpoint_on in your validation_step/validation_epoch_end. So your trainer config would look like this:
from pytorch_lightning.callbacks import ModelCheckpoint
# DEFAULTS used by the Trainer
checkpoint_callback = ModelCheckpoint(
save_top_k=1,
verbose=True,
mode='max',
)
trainer = Trainer(checkpoint_callback=checkpoint_callback)
and your validation phase either like this:
def validation_step(self, batch, batch_idx):
acc = self.calculate_acc(batch)
result= pl.EvalResult(checkpoint_on=val_acc) # for early stopping you could also use early_stop_on here
result.log('val_acc', acc)
return result
without any validation_epoch_end (per default the result will average your values for checkpointing now, see here for details) or you could also do it like this when you really want to sum it:
Actually, I had tested this method of passing a tensor to checkpoint_on, but the model was not being saved. Looking more into it, I figured that the tensor I’ve been passing had a value zero and it turns out passing a tensor with a value zero does not do anything (code).
Anyway, now I know that this is the correct way to do it. Thanks!