4.4 Defining Efficient Data Loaders (Part 1-4)

Slides

References

Code

Part 3: Data download & preparation, 4.4-dataloaders-part3-download-and-prep.ipynb
Part 4: Run dataloaders, 4.4-dataloaders-part4-define-and-run.py

What we covered in this video lecture

Now that we are beginning to work with larger neural networks and datasets, it’s time to worry about optimizing our data-loading pipeline. In this lecture, we discussed PyTorch’s Dataset and DataLoader classes, which help us fetch the training batches in the background using multiple processes. To avoid computational bottlenecks, using these background processes ensures that the next batch of data is ready when the model finishes its backward pass.

Additional resources if you want to learn more

Using the Dataset and DataLoader classes is the most convenient way to load data in PyTorch. However, there is also a new DataPipe class that the PyTorch team is developing. You can find out more about these optional DataPipes in the official TorchData repository here. I also wrote a tutorial about DataPipes which you might find helpful here.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

4.4 Defining Efficient Data Loaders (Part 1-4)

Slides

References

Quiz: 4.4 Defining Efficient Data Loaders

Watch Video 1 Mark complete and go to Unit 4.5 →

Videos

Follow along in a Lightning Studio

DL Fundamentals 4: Training Multilayer Neural Networks

Questions or Feedback?