I have a dataset of images and in regular PyTorch I would define a dataset that uses the __getitem__
function to load images as the dataset is being iterated over. How can I mimic this behavior using the DataModule? The examples I’ve seen that use the setup
function seems to load all the data in that function instead of loading them one by one as the data is being iterated over. This would be a problem if my training dataset is large and I have to load all those images into memory when setup
is called.
I haven’t seen an example that uses DataModule to load images/samples on the fly.
Here is my old dataset:
class ImageDataset(torch.utils.data.Dataset):
def __init__(self, x_train, y_train=None):
# x_train and y_train are pandas DataFrames
self.data = x_train
self.label = y_train
self.transform = transforms.Compose(
[
transforms.CenterCrop(128),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
''' Load image, transform, and return image along with target (if present) '''
image = Image.open(self.data.iloc[idx]["file_name"]).convert("RGB")
image = self.transform(image)
image_id = self.data.iloc[idx]["image_id"]
if self.label is not None:
label = self.label.iloc[idx]
sample = {"image_id": image_id, "image": image, "label": label}
else:
sample = {
"image_id": image_id,
"image": image,
}
return sample