



Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict file that can be loaded with torch.load(file) + load_state_dict() and used for training without DeepSpeed.




Utilities that can be used with Deepspeed.

lightning.pytorch.utilities.deepspeed.convert_zero_checkpoint_to_fp32_state_dict(checkpoint_dir, output_file, tag=None)[source]

Convert ZeRO 2 or 3 checkpoint into a single fp32 consolidated state_dict file that can be loaded with torch.load(file) + load_state_dict() and used for training without DeepSpeed. It gets copied into the top level checkpoint dir, so the user can easily do the conversion at any point in the future. Once extracted, the weights don’t require DeepSpeed and can be used in any application. Additionally the script has been modified to ensure we keep the lightning state inside the state dict for being able to run LightningModule.load_from_checkpoint('...')`.

  • checkpoint_dir (Union[str, Path]) – path to the desired checkpoint folder. (one that contains the tag-folder, like global_step14)

  • output_file (Union[str, Path]) – path to the pytorch fp32 state_dict output file (e.g. path/pytorch_model.bin)

  • tag (Optional[str]) – checkpoint tag used as a unique identifier for checkpoint. If not provided will attempt to load tag in the file named latest in the checkpoint folder, e.g., global_step14


>>> from lightning.pytorch.utilities.deepspeed import (
...     convert_zero_checkpoint_to_fp32_state_dict
... )
>>> # Lightning deepspeed has saved a directory instead of a file
>>> save_path = "lightning_logs/version_0/checkpoints/epoch=0-step=0.ckpt/" 
>>> output_path = "" 
>>> convert_zero_checkpoint_to_fp32_state_dict(save_path, output_path) 
Saving fp32 state dict to
Return type


You are viewing an outdated version of PyTorch Lightning Docs

Click here to view the latest version→