Device agnostic modelling

Hey,
I seem to have issues with my model to run on a GPU using lighting ai cloud. I can switch to a GPU environment, but it doesn’t use the GPU. The GPU is available and the device itself is set to GPU, it seems the tensors are not send/put on the GPU. I found that I should put all tensors to the device using .to(...), with the ... being another tensor, or by registering a buffer. (Hardware agnostic training (preparation) — PyTorch Lightning 2.5.0.post0 documentation)

However, I use a lot of helper classes, for example, from my DataModule, for the training, validation or test set (see below). I think I need a few examples of how to modify these classes (I think the main model class will be fine. So, two questions:

  1. someone who likes to help me out with giving a few examples based on my code (below)? - When I have a few concrete examples, I think I’ll be fine on my own.
  2. can I check somehow if everything is device-agnostic without actually switching? Or could I check which tensors are on which hardware without running the model?

Thanks a lot


some code, from the DataModule:

class F2ADataModule(pl.LightningDataModule):

    # whatever comes in between here   

    def _dataloader(self, indices):
        ds = SeriesDataset(self.data_ctrl, timesteps=self.timesteps*(1 + self.env_size * 2), indices=indices)
        return DataLoader(ds, batch_size=self.batch_size, drop_last=True, shuffle=False, collate_fn=F2ADataLoader.collation(self.timesteps, self.env_size))

    def train_dataloader(self):
        return self._dataloader(self.train_indices)

    def val_dataloader(self):
        return self._dataloader(self.val_indices)
        
    def test_dataloader(self):
        return self._dataloader(self.test_indices))

The collate function itself, with some stuff removed:

    def collation(self, timesteps=13, env_size=2):
        def collate_fn(batch):
            batch_tensor = torch.stack(batch) 

            normalizer = QuantileNormalizer(env_size, timesteps)
            norm_data = [
                normalizer(sample) for sample in batch_tensor if len(sample) >= 5
            ]
            
            batched_data = {}
            for key in norm_data[0].keys():
                if key.startswith("_N"): 
                    batched_data[key] = [sample[key] for sample in norm_data]
                else:
                    batched_data[key] = torch.stack([sample[key] for sample in norm_data])
            return batched_data
        return collate_fn

Now, I need to put the tensors (batch_tensor and the different items in batched_data) in this helper function to the right device, right? Same for the QuantileNormalizer, I have things like this in the QuantileNormalizer:

class QuantileNormalizer:
    
    def __init__(self, env_size, timesteps):
        self.target_idx = ...
        self.env_size = env_size

    def sample_values(self, data):
        return np.concatenate([PertSampler(...).sample(...) for x in data]) # in the PertSampler, I may refrain from using the GPU, or maybe not? This is just sampling, so no real tensors in there

    def __call__(self, data):

        target = data[self.target_idx]
        vals = self.sample_values(target)
        kde = KernelDensityCalculator().call(vals)
        
        # calculate normalized data per sample

        norm_data_chunks = torch.chunk(norm_data, self.env_size*2 + 1)

        target_chunk = norm_data_chunks[self.env_size]  
        env_before_chunks = torch.stack(norm_data_chunks[:self.env_size])
        env_after_chunks = torch.stack(norm_data_chunks[self.env_size+1:])
        
        
        return dict(rawdata = data, 
                    data = norm_data,
                    target = target_chunk,
                    env_before = env_before_chunks,
                    env_after = env_after_chunks,
                    _Ndata = {"quantiles": dict(zip(quantiles, kde_quantiles))} | data_pars
                    )