DistributedSampler and LightningDataModule

avilay · December 7, 2021, 5:42am

When creating data loaders for DDP training, in the LightningDataModule is it ok for me to set the DistributedSampler when instantiating the dataloader?

Something like the following -

class MyData(pl.LightningDataModule):
    def train_dataloader(self, stage):
        if stage == "fit":
            return DataLoader(
                self.trainset,
                batch_size=self.hparams.batch_size,
                sampler=DistributedSampler(self.trainset, shuffle=True)
            )

In the Multi-GPU docs the recommendation is to not explicitly use DistributedSampler. In my normal workflow I implement the LightningDataModule.train_dataloader() to provide the trainer with my dataloader. In this case, it makes sense to me to explicitly set the DistributedSampler when instantiating my data loader. However, this contradicts the advice given in the docs hence my question.

Thanks in advance.

goku · January 29, 2022, 9:45pm

hey @avilay

when you set DDP within Lightning Trainer, it will automatically add DistributedSampler internally, so you don’t need to add one. The reason it suggest it is to avoid minimal code-change in case you migrate from DDP to some single device training, since in that case, keeping distributedsampler in train_dataloader explicitly would need some code changes. But if you still want to keep it you can set Trainer(replace_ddp_sampler=False) to let it know to not add any distributed sampler when running in DDP mode.

Also, we have moved the discussions to GitHub Discussions. You might want to check that out instead to get a quick response. The forums will be marked read-only after some time.

Thank you

Topic		Replies	Views
How to not load complete in-memory dataset for every process in DDP training DDP/GPU	2	3990	October 17, 2023
Multi-GPU training issue - DDP strategy. Training hangs upon distributed GPU initialisation DDP/GPU	3	3825	January 18, 2023
Share state between DDP processes DDP/GPU	0	1229	June 3, 2021
Multi-GPU/Multi-Node training with WebDataset DDP/GPU	3	4715	March 2, 2023
DDP and pl.LightningDataModule parallelization Issues DDP/GPU	1	626	March 29, 2023

DistributedSampler and LightningDataModule

Related topics