Where should I load the model checkpoint when using configure_model?

When i load the model checkpoint in configure_model, the following error occurs.
It seems to create an empty model, where should I load the model checkpoint?

       size mismatch for gpt_neox.layers.6.attention.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is 
torch.Size([0]).
        size mismatch for gpt_neox.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in 
current model is torch.Size([0]).
        size mismatch for gpt_neox.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model 
is torch.Size([0]).
        size mismatch for gpt_neox.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in 
current model is torch.Size([0]).
        size mismatch for gpt_neox.layers.6.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model 
is torch.Size([0]).
        size mismatch for gpt_neox.layers.7.input_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model 
is torch.Size([0]).
        size mismatch for gpt_neox.layers.7.input_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is 
torch.Size([0]).
        size mismatch for gpt_neox.layers.7.post_attention_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in 
current model is torch.Size([0]).
        size mismatch for gpt_neox.layers.7.post_attent

I use deepspeed stage3 strategy.

and no replies :sob: did you figure it out? :slight_smile: