What is the exact use case for teardown in the LightingDataModule?

isnot2bad · April 20, 2023, 7:24am

In the doc, it is mentioned that “teardown can be used to clean up the state. It is also called from every process across all the nodes.” and “setup is called from every process across all the nodes. Setting state here is recommended.”

I write an example based on the example according to the document:

def setup(self, stage):
    assert isinstance(stage, str) and stage in ('fit', 'validate', 'test')  # , 'predict'

    if stage == 'fit':
        self.traset = MyDataset(stage='train')
        self.valset = MyDataset(stage='val')
    elif stage == 'validate':
        self.valset = MyDataset(stage='val')
    else:  # 'test'
        self.tstset = MyDataset(stage='tst')

Does that mean my teardown() should be as follows?

def teardown(self, stage):
    assert isinstance(stage, str) and stage in ('fit', 'validate', 'test')  # , 'predict'

    if stage == 'fit':
        del self.traset
        del self.valset
    elif stage == 'validate':
        del self.valset
    else:  # 'test'
        del self.tstset

But what is the practical significance of doing so? Is it necessary? Without teardown(), won’t the objects pointed to by self.traset, self.valset, and self.tstset be destroyed together with the MYDataModule() object?

awaelchli · April 21, 2023, 10:25am

In your simple case yes, it does not provide value since you could just rely on garbage collection. So there is nothing to do manually. The implementation of teardown is not mandatory, so you don’t need to worry about it if you don’t see a use in your project.

However, there are valid use cases. Simple use case for illustration:

Imagine you have training and test sets, and you want to load them into memory. If you are training, you only want the training dataset to be in memory, and when you are testing, you only want the test set to be in memory (because maybe having both in memory does not fit). Then you need to clear the memory after fitting in teardown, so you can make space for testing to begin.

datamodule = MyDataModule()
trainer.fit(model, datamodule) # after fit, it calls .teardown("fit")
trainer.test(model, datamodule)

isnot2bad · April 21, 2023, 10:51am

Thank you for your answer.
I understand now.