Is it safe to manually call `datamodule.setup()`?

I’m using cocoapi for metric calculation and it needs dataset to build COCO object out of it.

I’ve written dataset initialization logic inside DataModule’s setup() method which I’m guessing is called internally by the trainer to ensure it has been called only once.

The cocoapi returns CocoEvaluator which I’m using in the main LightningModule. To handle this dependency chain, I need to call setup() method of DataModule so that the dataset has been constructed. I feel this defeats the purpose of having dedicated DataModule and LightningModule if I’ve a common dependency.

Let me know if there’s any better way of doing this. Can we provide DataModule as a parameter to LightningModule?

So you have something defined in datamodule.setup() and wants to access in LightningModule? I believe they are connected via trainer. You can use self.trainer.datamodule in LightningModule.

or can just do

datamodule = SomeLightningDataModule()
model = SomeLightningModule()
model.datamodule = datamodule

and access it using self.datamodule in your LightningModule.

Then how can I ensure that LightningModule’s setup method should be called after DataModule’s?

DataModule’s setup is called before LightningModule’s setup.

Fwiw the docs literally suggest doing this (i.e. calling it manually) if necessary:

https://pytorch-lightning.readthedocs.io/en/latest/data/datamodule.html#using-a-datamodule

If you need information from the dataset to build your model, then run prepare_data and setup manually (Lightning ensures the method runs on the correct devices).

Although I’ve found the result of that being that trainer will just call setup() again