Code structuring for text classification with hf bert-uncase

Hi,

I want to know if I’m doing this correctly.
43-kaggle colab v6 - qa.ipynb - Colaboratory (google.com)

Q1 - In a pl.LightningModule class, I’ve defined the forward function to have the model to accept the input_ids, attention_mask and labels.

model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)

Q2 - I want to train my model and do a prediction on dataset that doesn’t contain the label column. Should I add a predict function in pl.LightningModule class?

# Define PyTorch Lightning module
class SentimentClassifier(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.loss = torch.nn.CrossEntropyLoss()

    def forward(self, input_ids, attention_mask, labels=None):
        outputs = self.model(input_ids, attention_mask=attention_mask, labels=labels)
        return outputs

    def training_step(self, batch, batch_idx):
        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        labels = batch['label']
        outputs = self(input_ids, attention_mask, labels)
        loss = outputs.loss
        self.log('train_loss', loss)
        return loss

    def validation_step(self, batch, batch_idx):
        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        labels = batch['label']
        outputs = self(input_ids, attention_mask, labels)
        loss = outputs.loss
        preds = outputs.logits.argmax(-1)
        acc = torch.sum(preds == labels).item() / len(labels)
        self.log('val_loss', loss)
        self.log('val_acc', acc)
        return acc
        # input_ids = batch['input_ids']
        # attention_mask = batch['attention_mask']
        # labels = batch['label']
        # outputs = self(input_ids, attention_mask, labels)
        # loss = outputs.loss
        # acc = metric.compute(predictions=outputs.logits.argmax(-1), references=labels)
        # self.log('val_loss', loss)
        # self.log('val_acc', acc)
        # return acc

    def test_step(self, batch, batch_idx):
        input_ids = batch['input_ids']
        attention_mask = batch['attention_mask']
        labels = batch['label']
        outputs = self(input_ids, attention_mask, labels)
        loss = outputs.loss
        preds = outputs.logits.argmax(-1)
        acc = torch.sum(preds == labels).item() / len(labels)
        self.log('test_loss', loss)
        self.log('test_acc', acc)
        return acc
        # input_ids = batch['input_ids']
        # attention_mask = batch['attention_mask']
        # labels = batch['label']
        # outputs = self(input_ids, attention_mask, labels)
        # loss = outputs.loss
        # acc = metric.compute(predictions=outputs.logits.argmax(-1), references=labels)
        # self.log('test_loss', loss)
        # self.log('test_acc', acc)
        # return acc

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=2e-5)

I’m new to pytorch in general.

Thank you for your time!!! :slight_smile:

Hi,
I haven’t gone through the notebook, just checked the module you sent.

  1. No need to define the loss module as it is already defined and used in hf model itself.
    Since you don’t use it anywhere I assume it’s just a leftover code.

  2. For metrics I would suggest using TorchMetrics as it play great with Lighting and it makes computing the metrics easy in distributed training. With your current implementation there might be a problem if the last batch is shorter than the other batches.

  3. Yes if you want to do the inference you have to define the predict function.

2 Likes

I appreciate the feedbacks, I’ll look into it to improve :slight_smile: