Custom training - RuntimeError due to unused parameters

I’m trying to use donut, which is a transformer model with a lightning implementation, and I want pre-train it on a language it hasn’t been yet on my desktop. Unfortunately the version of the stack provided on the original repo doesn’t support my GPU, so I had to port it to a newer PyTorch Lightning version from 1.6 to 2.0. I’m following the upgrade guide, but I’m still running into issues.

Upon the first run, I got the following error:

RuntimeError: It looks like your LightningModule has parameters that were not used in 
producing the loss returned by training_step. If this is intentional, you must enable 
the detection of unused parameters in DDP, either by setting the string value 
`strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with 
`strategy=DDPStrategy(find_unused_parameters=True)`.

Since I haven’t really used Lightning before, I’m unsure of what this means. I’ve managed to get it run by setting said string value to True, but I don’t know if I did something wrong while porting or if this is by design.

I’ve checked the documentation of lightning, but there’s very limited information. Setting this parameter to True comes with a performance impact, so I’d like to know if I’m doing something wrong or if it’s needed.

The training step is defined as follows:

def training_step(self, batch, batch_idx):
    image_tensors, decoder_input_ids, decoder_labels = list(), list(), list()
    for batch_data in batch:
        image_tensors.append(batch_data[0])
        decoder_input_ids.append(batch_data[1][:, :-1])
        decoder_labels.append(batch_data[2][:, 1:])
    image_tensors = torch.cat(image_tensors)
    decoder_input_ids = torch.cat(decoder_input_ids)
    decoder_labels = torch.cat(decoder_labels)
    loss = self.model(image_tensors, decoder_input_ids, decoder_labels)[0]
    self.log_dict({"train_loss": loss}, sync_dist=True)
    return loss

Here loss is calculated from self.model which is an instance of DonutModel (line 369). Another weird thing is, the loss actually doesn’t seem to decrease during training as shown on Tensorboard:

I’m unsure what’s wrong as most of this is new to me and I’d appreciate some help.

I’ll gladly share more code as I’m not sure where the parameters are being checked for this error message. I’d be thankful for any help.