5.5 Organizing Your Data Loaders with Data Modules
- Official LightningDataModule documentation
What we covered in this video lecture
In this lecture, we introduced
LightningDataModule as an additional organizational layer for our
DataLoaders, adding extra convenience when using the
Trainer. However, it is still possible to use the DataLoaders separately as before.
Regarding multi-GPU computing (Unit 9), data modules have the advantage that we can use
prepare_data and setup methods separately. The
prepare_data method is called only within a single process on CPU — this is useful because downloading and saving data with multiple processes (distributed settings) can result in corrupted data. Then we can use setup for data operations we might want to perform on every GPU, for example, partitioning the dataset into train/val/test splits and applying data transforms (we will talk more about data transforms and augmentation in Unit 7).
Additional resources if you want to learn more
If you are interested in additional details about the
LightningDataModule you can browse through the more technical LightningDataModule documentation. However, at this stage, you learned all you need to use
LightningDataModules, so please feel free to skip the documentation for now.
Log in or create a free Lightning.ai account to access:
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits