RAM overload during validation with many dataloaders

axellevy98 · July 5, 2023, 8:32am

Hi,
I use many different Dataloaders for my validation, so that I have the performances of my model according to data types. The thing is the number of Dataloaders is huge : almost 500.
When performing the validation step, my RAM (256 Go !) is overloaded and I don’t really know why. It seems that each data is loaded simultaneously from all the dataloaders… Is there a way I can load them successively ?

My datamodule code bellow :

class TraxamDataModule(pl.LightningModule):

    def __init__(self, path: str, batch_size: int = 64, num_worker: int = 16, max_N = None):
        super().__init__()
        # And more...

    def setup(self, stage: str = None):
        # Some stuff
        datasets = []
        for i, folder in enumerate(os.listdir(os.path.join(self.path, 'valid'))):
            datasets.append(SignalDataset(os.path.join(self.path, 'valid', folder)))
        self.valids = datasets
        # Other stuff

        def val_dataloader(self):
        return [DataLoader(dtst, batch_size=self.batch_size,
                           num_workers=1, shuffle=False, drop_last=True) for dtst in self.valids]