How do I continue training the model ？

yczhangnaxin · June 30, 2023, 9:25am

I have trained a model with pytorch_Lighting, and now I want to continue training this model, but the training task and the hyperparameters have changed, and an error with different hyperparameters will be reported when the new task is trained.
I want to know how to train new tasks by loading only the weights of the trained model?

awaelchli · July 2, 2023, 10:43am

@yczhangnaxin If you changed hyperparameters that influence the model architecture, then you might not be able to load the trained model because it would not be possible for PyTorch to map the parameters in the checkpoint to matching ones in the model (shape mismatch, name mismatch etc.). You would have to edit the checkpoint by hand (if that’s even meaningful).

checkpoint = torch.load("path/to/checkpoint/file.ckpt")
# For example, drop a parameter
checkpoint["state_dict"].pop("encoder.x.y.z")
# Load the weights
model.load_state_dict(checkpoint["state_dict"])

You have to decide how you want to adapt your trained model to your new task. For example, if your task changes the last layer (classification on a new set of labels) then you should follow the above recipe and load every layer except the last one.

yczhangnaxin · July 6, 2023, 8:28am

@awaelchli thank you for reply. I used T5 to train the two tasks separately, and here are the models trained for the two tasks:
Task A:

T5Finetuner(
  (T5ForConditionalGeneration): ModifiedT5ForConditionalGeneration(
(shared)
(encoder)
(decoder)
(lm_head)
)

Task B:

T5Finetuner(
  (T5ForConditionalGeneration): RankT5ForConditionalGeneration(
(shared)
(encoder)
(decoder)
(lm_head)
)

The trained models differ only in name and weight, and the structure is the same. I want the model trained with Task A to be trained directly on Task B, not initialized with the T5 pre-trained model. I wonder if pytorch_lighting offers a way to do this?