Hello,
I am trying to run a CNN model in my MacBook laptop, which has Apple M1 chip. From what I know, PyTorch lightning already supports Apple M1 for multiple GPU training, but I am unable to find detailed tutorial about how to use it. So I tried the following based on the documentation I can find.
- I create the trainer by using “mps” accelerator and devices=1. From the documents I read, I think that I should use devices=1, and Lightning will use multiple GPUs automatically.
trainer = pl.Trainer(
accelerator='mps',
devices=1,
strategy="ddp"
callbacks=[checkpoint_callback, lr_monitor],
logger=tb_logger,
)
- I created a class inherited from LightningModule.
class Moco_v2(LightningModule):
In this class, I called the following two functions to get the total number of the workers and current worker.
torch.distributed.get_rank()
torch.distributed.get_world_size()
But I got the following error:
raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.
So I need to call init_process_group somewhere, I guess that I could just call this function in the function __init__
of this class something like this:
torch.distributed.init_process_group("gloo", world_size=total_workers, rank=current_rank)
But I am not sure what value should be passed to the parameter world_size
and rank
. For the world_size
, probably I can just use the total number of GPU cores in MacOS M1? For the parameter rank
, apparently I can not call torch.distributed.get_rank()
. Actually this question is not specific to Apple M1, I also want to know how to call init_process_group
for accelerator='gpu'
.
Is there any tutorial to show some examples how to use DDP in LightningModule in Apple M1?
Thanks.