lightning version: 2.0.0
When I resume a trainer from an existing checkpoint for validation/testing, the global_step of the trainer is not set to the one in the checkpoint. The command I’m using:
trainer.validate(system, datamodule=dm, ckpt_path=cfg.resume)
trainer.test(system, datamodule=dm, ckpt_path=cfg.resume)
The above commands both give trainer.global_step=0. It seems a new problem in pl2 (works fine in previous versions), and I couldn’t find anything related in the migration guide. What has changed? How can I properly restore the global_step in validation/testing?
I am having troubles with ModelCheckpoint too. One suggestion I can give you is to manually load the checkpoint and see if “global_step” is in the keys, if not then check that you haven’t set the flag “save_weights_only” to True.