5.5 Organizing Your Data Loaders with Data Modules

Slides

Part 1: Data Modules As An Optional Organizational Layer

References

Official LightningDataModule documentation

Code

Part 2: Organizing Your Data Loaders with Data Modules

What we covered in this video lecture

In this lecture, we introduced LightningDataModule as an additional organizational layer for our DataLoaders, adding extra convenience when using the Trainer. However, it is still possible to use the DataLoaders separately as before.

Regarding multi-GPU computing (Unit 9), data modules have the advantage that we can use prepare_data and setup methods separately. The prepare_data method is called only within a single process on CPU — this is useful because downloading and saving data with multiple processes (distributed settings) can result in corrupted data. Then we can use setup for data operations we might want to perform on every GPU, for example, partitioning the dataset into train/val/test splits and applying data transforms (we will talk more about data transforms and augmentation in Unit 7).

Additional resources if you want to learn more

If you are interested in additional details about the LightningDataModule you can browse through the more technical LightningDataModule documentation. However, at this stage, you learned all you need to use LightningDataModules, so please feel free to skip the documentation for now.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

5.5 Organizing Your Data Loaders with Data Modules

Slides

References

Quiz: 5.5 Organizing Your Data Loaders with Data Modules (Part 1)

Quiz: 5.5 Organizing Your Data Loaders with Data Modules (Part 2)

Watch Video 1 Mark complete and go to Unit 5.6 →

Videos

Follow along in a Lightning Studio

DL Fundamentals 5: PyTorch Lightning

Questions or Feedback?