Getting element 0 error while fine tuning llm

brucethecapedcrusade · July 16, 2023, 7:59pm

This is the error I am getting
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

I am trying to fine tune the t5 llm model here.
The Model:

class LModel(pl.LightningModule):
    def __init__(self):
        super(LModel, self).__init__()
        self.model = MODEL
  
    def forward(self, input_ids, attention_mask, labels=None, decoder_attention_mask=None):
        outputs = self.model(input_ids=input_ids,
                        attention_mask=attention_mask,
                        labels=labels,
                        decoder_attention_mask=decoder_attention_mask)
        return outputs.loss, outputs.logits
  
    def training_step(self, batch, batch_idx):
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        labels = batch["summary_ids"]
        decoder_attention_mask = batch["summary_mask"]

        loss, output = self(input_ids, attention_mask, labels, decoder_attention_mask)
        return loss

    def validation_step(self, batch, batch_idx):
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        labels = batch["summary_ids"]
        decoder_attention_mask = batch["summary_mask"]

        loss, output = self(input_ids, attention_mask, labels, decoder_attention_mask)
        return loss
 
    def test_step(self, batch, batch_idx):
        input_ids = batch["input_ids"]
        attention_mask = batch["attention_mask"]
        loss, output = self(input_ids=input_ids, 
                      attention_mask=attention_mask)
        return loss
    
    def configure_optimizers(self):
        optimizer = AdamW(self.model.parameters(), lr=0.0001)
        scheduler = get_linear_schedule_with_warmup(
                optimizer, num_warmup_steps=0,
                num_training_steps=EPOCHS*len(df))
        return {'optimizer': optimizer, 'lr_scheduler': scheduler}

And the trainer looks like this:

device  = 'cuda' if torch.cuda.is_available() else "cpu"
trainer = pl.Trainer(
    max_epochs=EPOCHS,
    accelerator=device
)


trainer.fit(model,module)

can anyone let me know why am I getting this element 0 error and how to fix it?

awaelchli · July 16, 2023, 10:44pm

Hi @brucethecapedcrusade

Does your self.model have parameters that require gradients?

What does the following print?

print(sum(p.numel() for p in self.model.parameters() if p.requires_grad))

brucethecapedcrusade · July 17, 2023, 6:08am

Hi @awaelchli ,
I am getting this when I run the piece of code:
print(sum(p.numel() for p in model.parameters() if p.requires_grad))
222903552

brucethecapedcrusade · July 17, 2023, 6:35am

I tried printing the grad_fn :

def forward(self, input_ids, attention_mask, labels=None, decoder_attention_mask=None):
        outputs = self.model(input_ids=input_ids,
                        attention_mask=attention_mask,
                        labels=labels,
                        decoder_attention_mask=decoder_attention_mask)
        print("outputs.logits.grad_fn:",outputs.logits.grad_fn)
        print("total params reuires grad",sum(p.numel() for p in self.model.parameters() if p.requires_grad))
        print("model.training:",model.training)
        return outputs.loss, outputs.logits

I got this output:

outputs.logits.grad_fn: None
total params reuires grad 222903552
model.training:True

Topic		Replies	Views
Easily skipping optimizers for modular networks implementation help	4	1100	September 7, 2020
Why does training fails with "require grad and does not have a grad_fn"? LightningModule	3	5330	August 8, 2023
F1 score output tensor does not require grad and does not have a grad_fn	0	783	March 4, 2021
RuntimeError When Integrating LoRA Layers Trainer	1	505	March 1, 2024
Fine tuning using LLAMA models LightningModule	0	40	November 12, 2024

Getting element 0 error while fine tuning llm

Related topics