Cannot load hyper parameters properly from a checkpoint

alexcoca · December 16, 2020, 9:25am

Hey, I’m very new to PyTorch Lightning, but have you check this part of the docs out? It seems to go into some detail about checkpointing.

However, what I do not understand is why you need to load the model separately from the checkpoint? Does passing the resume_from_checkpoint flag to the Trainer not load all the states (e.g., step, epoch, optimizer state, model state) (it should, according to the doc). I would try to manually query all the optimizer states before/after loading and see what is not being restored correctly, maybe raise an issue on this if a particular state is not being saved/restored.