How to use DDP in LightningModule in Apple M1?

Hello,

I am trying to run a CNN model in my MacBook laptop, which has Apple M1 chip. From what I know, PyTorch lightning already supports Apple M1 for multiple GPU training, but I am unable to find detailed tutorial about how to use it. So I tried the following based on the documentation I can find.

  1. I create the trainer by using “mps” accelerator and devices=1. From the documents I read, I think that I should use devices=1, and Lightning will use multiple GPUs automatically.
trainer = pl.Trainer(
        accelerator='mps',
        devices=1,
        strategy="ddp"
        callbacks=[checkpoint_callback, lr_monitor],
        logger=tb_logger,
    )
  1. I created a class inherited from LightningModule.
class Moco_v2(LightningModule):

In this class, I called the following two functions to get the total number of the workers and current worker.

torch.distributed.get_rank()
torch.distributed.get_world_size()

But I got the following error:

raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.

So I need to call init_process_group somewhere, I guess that I could just call this function in the function __init__ of this class something like this:

torch.distributed.init_process_group("gloo", world_size=total_workers, rank=current_rank)

But I am not sure what value should be passed to the parameter world_size and rank. For the world_size, probably I can just use the total number of GPU cores in MacOS M1? For the parameter rank, apparently I can not call torch.distributed.get_rank(). Actually this question is not specific to Apple M1, I also want to know how to call init_process_group for accelerator='gpu'.

Is there any tutorial to show some examples how to use DDP in LightningModule in Apple M1?

Thanks.

@zhguo1

Is there any tutorial to show some examples how to use DDP in LightningModule in Apple M1?

The M1 chip contains one GPU. Therefore, you can’t use multiple GPUs. In fact, Lightning will raise an error if you attempt to:

ValueError: You set `strategy=ddp` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerator='cpu'` or change the strategy.

You don’t need to do any of torch.distributed in your LightningModule. Just remove that part and all you change is accelerator="mps".

@awaelchli Thank you for your reply.

My MacBook has M1 Pro which has 14 GPU cores. Do you mean that the Lightning does not support the parallelism on multiple GPU cores in Apple M1? Does this mean that I will not see any performance improvement on Apple M1 compared with the case where I run Lightning on a Linux machine with a single CPU without GPU?

Actually I did not see the error you mentioned when I use stragegy=ddp and accelerator='mps'. The Lightning version I am using is 2.2.0.

ValueError: You set `strategy=ddp` but strategies from the DDP family are not supported on the MPS accelerator. Either explicitly set `accelerator='cpu'` or change the strategy.

Lightning does not implement the M1 support, this is done in PyTorch. Parallelization over the GPUP is handled by the MPS backend in PyTorch. Devices in Trainer is not how many cores there are, but how many GPUs. Like, in a single NVIDIA GPU there are thousands of CUDA cores.

Your M1 MacBook has one GPU. It doesn’t matter how many cores it has, you set devices=1 and there is no DDP.

Thank you. A couple of follow up questions.

Parallelization over the GPUP is handled by the MPS backend in PyTorch.

How to use the parallelization of MPS provided by PyTorch? Could you point me to some documents/examples if any?

If I switched to a machine with NVIDIA GPU. I guess that I still need to call init_process_group to set up DDP?

torch.distributed.init_process_group("gloo", world_size=total_workers, rank=current_rank)

But how to pass the parameter rank since I cannot call torch.distributed.get_rank()? Is this function init_process_group new in the recent version of Lightning? I looked at some public code (for example, this one) (I was able to run it in a machine with multiple processes on a CPU) which use the Lightning 1.6, and those code just call torch.distributed.get_rank() without needing to call init_process_group first.

How to use the parallelization of MPS provided by PyTorch? Could you point me to some documents/examples if any?

It’s already done by PyTorch / the MPS backend. You don’t need to do anything. There almost no documentation out there for “how it works”, so you won’t find much.

If I switched to a machine with NVIDIA GPU. I guess that I still need to call init_process_group to set up DDP?

No absolutely not. The whole point of using Lightning is that you don’t need to do this. Follow the intro here to learn about Lightnings benefits. Then in step 7 there you’ll see how to use multi-GPU and you will find out that it is absolutely trivial.

If I switched to a machine with NVIDIA GPU. I guess that I still need to call init_process_group to set up DDP?

No absolutely not. The whole point of using Lightning is that you don’t need to do this. Follow the intro here to learn about Lightnings benefits. Then in step 7 there you’ll see how to use multi-GPU and you will find out that it is absolutely trivial.

I am wondering how to call the function torch.distributed.get_rank() and torch.distributed.get_world_size() in the LightningModule. Similar to what was done in the function _batch_shuffle_ddp in this code, I want to re-shuffle the batch. The error I am getting requires me to call the function init_procwss_group first. Thanks!

self.trainer.global_rank and self.trainer.world_size are available in the LightningModule hooks.

self.trainer.global_rank and self.trainer.world_size are available in the LightningModule hooks.

Thank you very much. Do you have any advise where I should call the function init_process_group. I tried to call it in the function __init___() of my class Moco_v2, which inherits LightningModule, but I got the following error:

    File "/Users/zhguo1/miniconda3/envs/pytorch_ml/lib/python3.12/site-packages/pytorch_lightning/core/module.py", line 207, in trainer
    raise RuntimeError(f"{self.__class__.__qualname__} is not attached to a `Trainer`.")

I have the following code:

# Moco_v2 inherits LightningModule
model = Moco_v2(....) 
...
trainer = pl.Trainer(
        accelerator='mps',
        devices=1,
        strategy='ddp',
        callbacks=[checkpoint_callback, lr_monitor],
        logger=tb_logger,
    )
   trainer.fit(
        model,
        datamodule=datamodule,
        ckpt_path=params['ckpt_path']
    )

It looks like that the trainer is attached to the model after the call trainer.fit(model,...), so the trainer is not available in the function __init__() of the LightingModule?

Could anyone answer my question about the trainer not attached to LightningModule? Appreciate it.