Training fails: , but found at least two devices, cuda:0 and cpu

I trained my model successfully for the past month.
But when using Pytorch-lightning 1.1.6 I got this error:

“Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!”

Any advice on what happened?
Thank you

Usually this happens when you create a Tensor in a module without using .type_as(x) or not setting the device to self.device (pytorch lightning modules know what device they’re on), so the Tensor ends up being on CPU while every other tensor used during training is on GPU.

Check this out:
https://pytorch-lightning.readthedocs.io/en/stable/multi_gpu.html#init-tensors-using-type-as-and-register-buffer

3 Likes