I am trying to run parallel learning with a large dataset.
In my understanding, the dataset should be loaded into RAM, and then the batches are sent to the GPU.
However, for some reason the data is loaded into RAM for each GPU involved in the training.
That is, when training for 6 gpu runs simultaneously 6 datoaders and each in RAM.
As a result, even before the start of training the memory in RAM is already completely exhausted, because the dataset is duplicated there 6 times!
Firstly, it is impossible to load a large dataset for training.
Secondly, it is very long.
if __name__ == '__main__':
Pytorch_lightning_MNIST_my = Pytorch_Lightning_my()
trainer = pl.Trainer(accelerator='gpu', devices=6, max_epochs=EPOCHS, strategy=DDPStrategy(find_unused_parameters=False))
DS = np.vstack((np.loadtxt(learningDatasetFile, skiprows=1, delimiter=",", dtype=np.float32, max_rows=125000), np.loadtxt(validateDatasetFile, skiprows=1, delimiter=",", dtype=np.float32, max_rows=25000)))
train_tensorX = torch.from_numpy(DS[:, :-1]).to("cuda:0")
medianANDstdArr = makeStandart(train_tensorX, numOfPeriodsPerFeature)
train_tensorY = torch.from_numpy(DS[:, -1])
train_dataset = TensorDataset(train_tensorX.to("cpu"), train_tensorY)
trainer.fit(Pytorch_lightning_MNIST_my, DataLoader(train_dataset, shuffle=True, batch_size=10000, num_workers=num_of_threads))
Is it supposed to be like this?