Hi, everyone.
The current PL docs shows only examples to launch training with deepspeed strategy on a single node. Is there any example to launch multiple nodes deepspeed training please?
Hi
Launching experiments on multi-node isn’t different when using deepspeed vs. another strategy. You should fist check what kind of cluster you are on (SLURM, self-managed, etc.) and then choose the appropriate guide from here.
If you don’t want to set up a cluster, you can also try running in the cloud. There is a multi-node example here on this page Lightning AI with deepspeed as well.