Trainer.validate/test with ckpt_path does not resume global_step

lightning version: 2.0.0
When I resume a trainer from an existing checkpoint for validation/testing, the global_step of the trainer is not set to the one in the checkpoint. The command I’m using:

trainer.validate(system, datamodule=dm, ckpt_path=cfg.resume)
# or
trainer.test(system, datamodule=dm, ckpt_path=cfg.resume)

The above commands both give trainer.global_step=0. It seems a new problem in pl2 (works fine in previous versions), and I couldn’t find anything related in the migration guide. What has changed? How can I properly restore the global_step in validation/testing?

Probably it could depend on the Callback / Logger you are using to get the cfg.resume, could you share that too?

Thanks for the reply! cfg.resume points to the path of a checkpoint file automatically saved by the ModelCheckpoint callback, with basically the default settings.

I am having troubles with ModelCheckpoint too. One suggestion I can give you is to manually load the checkpoint and see if “global_step” is in the keys, if not then check that you haven’t set the flag “save_weights_only” to True.

Hope it will help someway!