Best practices for image loading in DataModule

aaron0 · December 23, 2020, 8:43pm

I have a dataset of images and in regular PyTorch I would define a dataset that uses the __getitem__ function to load images as the dataset is being iterated over. How can I mimic this behavior using the DataModule? The examples I’ve seen that use the setup function seems to load all the data in that function instead of loading them one by one as the data is being iterated over. This would be a problem if my training dataset is large and I have to load all those images into memory when setup is called.

I haven’t seen an example that uses DataModule to load images/samples on the fly.

Here is my old dataset:

class ImageDataset(torch.utils.data.Dataset):    
    def __init__(self, x_train, y_train=None):
        # x_train and y_train are pandas DataFrames
        self.data = x_train
        self.label = y_train
        self.transform = transforms.Compose(
        [
            transforms.CenterCrop(128),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
    
    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        ''' Load image, transform, and return image along with target (if present) '''
        image = Image.open(self.data.iloc[idx]["file_name"]).convert("RGB")
        image = self.transform(image)
        image_id = self.data.iloc[idx]["image_id"]
        if self.label is not None:
            label = self.label.iloc[idx]
            sample = {"image_id": image_id, "image": image, "label": label}
        else:
            sample = {
                "image_id": image_id,
                "image": image,
            }
        return sample

goku · December 28, 2020, 6:28pm

datamodules are meant to hold your dataloaders/datasets. You still need to do everything else related to data loading using PyTorch Dataset/Dataloader.

Topic		Replies	Views
Custom Image Lightning Dataloader DataModule	0	558	April 29, 2023
How to change the way dataloader handles data? DataModule	1	514	July 30, 2023
Question regarding prepare_data and setup in DataModule DataModule	1	3449	February 21, 2021
How to load subset of dataset in subset of epoch	0	973	December 19, 2022
Alternative for __getitem__ method of Dataset in LightningDataModule	1	2451	February 3, 2021

Best practices for image loading in DataModule

Related topics