Hello, I am currently working on a multi-task model with pytorch-lightning 2.0.9 where the whole model is written as a LightningModule
class (see code below) .
I got the following error with strategy="ddp"
root INFO - RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value
strategy='ddp_find_unused_parameters_true'
or by setting the flag in the strategy withstrategy=DDPStrategy(find_unused_parameters=True)
.
Here are the details about how I construct my multk-task model:
I override the training_step
in LightningModule
by calling 2 submodels model_one
and model_two
( pytorch nn.Model class).
- class model_one(torch.nn.Model)
- class model_two(torch.nn.Model)
Each task is updated separately with its own data ( and different loss functions and labels). These two tasks have some common shared layers ( a common BERT model), but they are with different headers.
During the training process, each batch contains only data from model_one or from model_two, and then the corresponding loss will be computed. The backward pass will only update the header of model_one and the Bert model, or the header of model_two and the shared layers.
class mutiltaskModel(LightningModule):
def __init__(self, model1: model_one, model2: model_two):
self.model1 = model_one
self.model2 = model_two
self.tasks = [ self.model1, self.model2]
def training_step(self, batch, batch_id):
task_id = batch[0][0]
task_module = self.task[task_id]
output = task_module.traning_step(batch, batch_id)
return output
The task modules are as follow ( model_one and model_two are similar but with different number of classes)
class model_one(torch.nn.Model):
def __init__(self, bert_model):
super().__init__()
self.bert= bert_model
self.loss_fun = BCEWithLogitsLoss(reduction='sum')
self.num_class = 10
def training_step(self, batch, batch_id):
task_ids, instance_ids, attention_mask, labels = batch
bert_emb = self.bert(instance_ids, attention_mask, output_hidden_states=True).last_hidden_state
logits = nn.Linear(768, self.num_class)(bert_emb)
probs = nn.Sigmoid()(logits)
loss = self.loss_fun(probs, labels)
return {'loss': loss}
My questions:
-
All the parameters in model 1 and model 2 are trainable, why do I get the above error with DDP strategy ? Also, the training with
strategy='ddp_find_unused_parameters_true
becomes slower. -
There is a batch size limitation when using the pytorch-lightning 2.0.9. I see that a single batch size is only 300KB ( the total training data is 5.7GB in lmdb format) and my GPU ( Nvidia L4 24G GPU) is with 24G, It still throws out a GPU memory error with DDP strategy.
Tried to allocate 84.35 GiB (GPU 5; 21.96 GiB total capacity; 12.16 GiB already allocated; 9.48 GiB free; 12.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Here is my Nvidia-smi information and my pytorch version
Nvidia Driver Version: 535.104.05
NVCC : Cuda compilation tools, release 12.2, V12.2.140
torch 2.0.1+cu118
torchaudio 2.0.2+cu118
torchvision 0.15.2+cu118