I am trying to train a model indefinitely with max_steps=-1. I want to run the validation after 1_000 steps and save the best model. How can I do this?
I am trying with:
def validation_step(self, batch, batch_idx):
loss = self.common_step(batch, batch_idx)
self.log("validation_loss", loss)
and
checkpoint_callback = ModelCheckpoint(
monitor='validation_loss',
dirpath='checkpoints',
filename='model-{epoch:02d}-{step}-{validation_loss:.2f}',
every_n_train_steps=1_000,
save_top_k=1,
verbose=True
)
trainer = Trainer(accelerator="auto",
default_root_dir="checkpoints",
# accumulate_grad_batches=accumulate_grad_batches,
max_steps=-1,
callbacks=[checkpoint_callback])
But I am gettin “HINT: Did you call log('validation_loss', value)
in the LightningModule
?” and only after the first epoch it starts gettins validation_loss.