Does PyTorch Lightning support Torch Elastic in FSDP

I found in the doc (GPU training (Intermediate) — PyTorch Lightning 2.1.3 documentation), Torch Elastic is supported when using DDP. I am wondering if that’s supported for FSDP or DeepSpeed. And if yes, is there any documentation for that? Thanks!

Yes certainly.

Docs: GPU training (Intermediate) — PyTorch Lightning 2.1.3 documentation

It doesn’t matter if it’s DDP or FSDP, from the perspective of launching the processes, this does not matter. Note that Lightning has it’s own launcher, so if you do single-node training using torchelastic wouldn’t make a difference. It’s more for multi-node training on a cluster where it becomes necessary:
https://lightning.ai/docs/pytorch/stable/clouds/cluster_intermediate_2.html

1 Like