Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

8.2 Training A Text Classifier Baseline

References

Code

What we covered in this video lecture

In this lecture, we are finetuning a DistilBERT model on the movie review classification task. DistilBERT reduces the size of a BERT model by 40%. At the same time, it retains 97% of BERT’s language understanding capabilities and is 60% faster.

We are exploring both approaches: finetuning only the last layer and finetuning all layers, and compare the predictive and computational performances of these two paradigms.

This code can be used as a template to adapt other pretrained large language models to various text classification tasks.

Additional resources if you want to learn more

In this lecture, we focused on finetuning transformers the conventional way. Recently, researchers begun to develop more parameter efficient finetuning methods to make finetuning more computationally affordable. If you are interested in learning more about these parameter-efficient finetuning methods, check out my articles on

Moreover, you might be interested in checking out our Lit-LLaMA repository, which contains an implementation of the popular LLaMA language model based on nanoGPT.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.2 Training A Text Classifier Baseline (Part 1)

Using stop words means we …

  • Correct. We remove common words like “the”, “it”, etc. which usually don’t have much meaning in terms of being able to classify text.
  • Incorrect. We don’t stop processing text, this would truncate the text arbitrarily.
  • Incorrect. We don’t stop processing text, this would truncate the text arbitrarily.
Please answer all questions to proceed.

Quiz: 8.2 Training A Text Classifier Baseline (Part 2)

If we have a input sentence with 10 words, we have a 10-dimensional feature vector using Bag-of-Words

  • Incorrect. The size is determined by the unique words in the training set, not the number of words in a training example.
  • Correct. The size is determined by the unique words in the training set, not the number of words in a training example.
Please answer all questions to proceed.
Watch Video 1

Unit 8.2

Videos