When i load the model checkpoint in configure_model, the following error occurs.
It seems to create an empty model, where should I load the model checkpoint?
size mismatch for gpt_neox.layers.6.attention.dense.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is
torch.Size([0]).
size mismatch for gpt_neox.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in
current model is torch.Size([0]).
size mismatch for gpt_neox.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([16384]) from checkpoint, the shape in current model
is torch.Size([0]).
size mismatch for gpt_neox.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in
current model is torch.Size([0]).
size mismatch for gpt_neox.layers.6.mlp.dense_4h_to_h.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model
is torch.Size([0]).
size mismatch for gpt_neox.layers.7.input_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model
is torch.Size([0]).
size mismatch for gpt_neox.layers.7.input_layernorm.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is
torch.Size([0]).
size mismatch for gpt_neox.layers.7.post_attention_layernorm.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in
current model is torch.Size([0]).
size mismatch for gpt_neox.layers.7.post_attent
I use deepspeed stage3 strategy.