Checkpoint Loading Issue: Unexpected Key Mismatch in PyTorch Lightning with Ray


I am currently working with Ray and PyTorch Lightning to train a Language Model, and I’m facing a strange issue when attempting to load a checkpoint after training. The problem arises from a mismatch between the keys in the state_dict, resulting in the following error message after executing model = MyLightningModule.load_from_checkpoint(/path/to/checkpoint.ckpt):

RuntimeError: Error(s) in loading state_dict for MyLightningModule:
Missing key(s) in state_dict: "model.llm.embeddings.word_embeddings.weight",
Unexpected key(s) in state_dict: "ings.word_embeddings.weight",

It seems that during the checkpoint save process, only “ings” from model.llm.embeddings is being preserved, while the rest of the key is getting cut off. I’m puzzled as to why did this mismatch occur?

I would greatly appreciate any insights or guidance you could provide. Thank you very much for your help!

I have identified that the issue is attributed to the Ray package, and I will reach out to them to seek a solution.