Cache/store transformed input to speed up data loading

JXuann · November 26, 2022, 6:49pm

Hey there,

The inputs to my dataloader are audio files (wav.), which during the preprocessing go through stft and other transformations. The spectrograms are then fed into the network.

Instead of computing stft and performing the same transformations at each epoch, I would like to cache the transformed inputs either in RAM or on disk. What’s a more efficient way of doing it in pytorch and with lightning?

I have seen people caching those transformed as numpy, which I don’t know if is slower than saving as tensors. Others use tfrecord and custom pytorch dataloader. (I would like to utilize lightning’s dataloader if possible.)

Many thanks in advance!

Topic		Replies	Views
Temporal Fusion Transformer to C++ (libtorch implementation help	0	1350	August 17, 2021
Custom Image Lightning Dataloader DataModule	0	559	April 29, 2023
From PyTorch's custom Dataset to Lightning's custom DataModule DataModule	0	1583	July 6, 2022
Create tensor on device for custom dataclass DataModule	2	1008	May 19, 2023
Controlling Data Location in memory	5	2264	November 24, 2023

Cache/store transformed input to speed up data loading

Related topics