I would like to train a LightningModule model on a machine having multiple (3) GPUs.
In my program, I create the train/val DataLoaders, with the specification on the generator as follows
train_loader = DataLoader(dataset, batch_size, generator = torch.Generator(device = 'cuda'))
otherwise the random number generator returns items being on cpu, which does legitimately raise a TypeError.
By thus doing I get the error:
TypeError: cannot pickle 'torch._C.Generator' object
which I think comes from the fact that the generator object instantiated in the DataLoader constructor can not be serialized and pickled, being it on the GPU device.
If I use only 1 GPU (passing the parameter gpus = 1
in the Trainer constructor and declaring and environment variable CUDA_VISIBLE_DEVICES="0"
), no error occurs.
I use a custom torch.utils.data.Dataset
-inherited class as dataset. It is a map-style dataset containing only numerical structured data. The overridden __getitem__
method returns a sequence of 2D images.
Apparently none of the solution found googling the error could match this problem. Thanks in advance for any hint.