LightningDatamodule.prepare_data() & LightningDatamodule.setup() outside of Trainer

Ursula_Iguaran · February 23, 2021, 5:31pm

Am I right in assuming that the LightningDatamodule methods prepare_data() and setup() should not create torch.Tensors (or more generally, any action that would assign data to a device)?

For example, if I wanted to apply a transform that included a torch.as_tensor() call, then this should happen in the train/val/test_dataloader() methods, not in the prepare_data() or setup() methods, correct?

I ask this because my understanding is that the code doesn’t (shouldn’t) know about the engineering/hardware unless it is run by a Trainer. Yet on the LightningDataModule doc page, it is suggested that “when information about the dataset is needed to build the model”, prepare_data() and setup() can be called outside the Trainer:

dm = MNISTDataModule() dm.prepare_data() dm.setup('fit') model = Model(num_classes=dm.num_classes, width=dm.width, vocab=dm.vocab) trainer.fit(model, dm) dm.setup('test') trainer.test(datamodule=dm)

dm = MNISTDataModule()
dm.prepare_data()
dm.setup('fit')

model = Model(num_classes=dm.num_classes, width=dm.width, vocab=dm.vocab)
trainer.fit(model, dm)

dm.setup('test')
trainer.test(datamodule=dm)

Finally, if this is right, is there a rule of thumb to make sure I don’t accidentally call a method that uses engineering/hardware knowledge behind the scenes?

P.S. The docs state
prepare_data is called from a single GPU. Do not use it to assign state (self.x = y)
and
setup is called from every GPU. Setting state here is okay
but these comments weren’t enough for me to feel confident in what is/isn’t allowed.

Topic		Replies	Views
Question regarding prepare_data and setup in DataModule DataModule	1	3449	February 21, 2021
Creating Custom Dataset with LightningDataModule DataModule	2	5052	November 8, 2020
DDP and pl.LightningDataModule parallelization Issues DDP/GPU	1	607	March 29, 2023
Create tensor on device for custom dataclass DataModule	2	1006	May 19, 2023
LightningModule.train_dataloader() LightningModule	4	536	March 20, 2024

LightningDatamodule.prepare_data() & LightningDatamodule.setup() outside of Trainer

Related topics