Create tensor on device for custom dataclass

Hello everyone,
I am still quite new to pytorch. I read that you should create a torch tensor on the device itself, to avoid transfer (mentioned in the docs)
I have a custom dataclass that handles my data and looks like this:

class MyDataset(Dataset):
    def __init__(self, labels: np.array, data_path):
        self.targets = torch.from_numpy(labels)
        self.list_of_arrays = [np.memmap(memmap_path, dtype='float32', mode='r', shape=(30316, 1, 160, 392)) for memmap_path in data_path]
        self.transform = torchvision.transforms.GaussianBlur(kernel_size=(51, 51), sigma=(0.1, 2))

    def __getitem__(self, index):
        x = torch.from_numpy(np.stack([item[index] for item in self.list_of_arrays]))
        y = self.targets[index]

        return self.transform(x), y

      def __len__(self):
          return len(self.targets)

To give a short explanation: My data is distributed over several memmaps. Whenever data is requested I have to collect all the different pieces from those files.
in the init I create an array that stores all the numpy memmaps.
This dataclass is used by lightningdatamodule to get all the neccessary data.

I therefore have several questions:

  1. should I create the targets/label on the device or leave it like this?
  2. in getitem I first load my data and then apply the transformation. Can I create the data directly on the GPU at this point?
  3. Does Lightning actually stop me from messing up the device on which things are created? I am worried about something happening like “create tensor on gpu → gets moved to cpu for some calculations → gets moved to gpu again”. I have no idea if all calculations will be done on the target device (in this case GPU) once they are created there or if ,e.g, the transforms might cause trouble .
  4. Is this actually the way to apply the transformation to my data?

All advice is appreciated.

I read that you should create a torch tensor on the device itself, to avoid transfer

That’s a best practice, not a written rule :slight_smile: But yes, it is often beneficial to do that in the forward of a model for example, where execution speed is crucial.

  1. and 2) I think there it doesn’t matter in most cases, since it is likely that you will run with DataLoader(num_workers>0) so the data loading will run in a different process and thus overlap data transfers and computation. Generally that’s the pattern that works best for common use cases.

  2. Lightning won’t move your data back to CPU. It will take whatever the user provides and if not on the GPU, moves it. If it is already on the GPU, it is a no-op.

  3. Yes, this is a common pattern. Both paradigms exist: A) apply transforms per sample in the dataset B) apply transforms on the entire batch online. Which pattern to use often depends on the use case, the transforms at hand, efficiency and generally depends on which way is more convenient.

Thank you for the detailed answer.
It is in most cases not obvious to me how much certain aspects effect performance or interact with each other, so I am grateful for all the advice I can get.

1 Like