Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

8.7 A Large Language Model for Classification

References

Code

What we covered in this video lecture

In this lecture, we are finetuning a DistilBERT model on the movie review classification task. DistilBERT reduces the size of a BERT model by 40%. At the same time, it retains 97% of BERT’s language understanding capabilities and is 60% faster.

We are exploring both approaches: finetuning only the last layer and finetuning all layers, and compare the predictive and computational performances of these two paradigms.

This code can be used as a template to adapt other pretrained large language models to various text classification tasks.

Additional resources if you want to learn more

In this lecture, we focused on finetuning transformers the conventional way. Recently, researchers begun to develop more parameter efficient finetuning methods to make finetuning more computationally affordable. If you are interested in learning more about these parameter-efficient finetuning methods, check out my articles on

Moreover, you might be interested in checking out our Lit-LLaMA repository, which contains an implementation of the popular LLaMA language model based on nanoGPT.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.7 A Large Language Model for Classification (Part 1)

What does bidirectional encoding in the context of BERT refer to?

Incorrect. Hint: think of the masked language modeling pretraining task.

Correct. This is achieved through the use masked language modeling pretraining task.

Incorrect. BERT was trained primarily on English texts.

Incorrect. BERT is mainly an encoder.

Please answer all questions to proceed.

Quiz: 8.7 A Large Language Model for Classification (Part 2)

What is the primary goal of the DistilBERT architecture?

Incorrect. Using computational tricks, DistilBERT is much faster than BERT while offering almost identical modeling performance.

Correct. Using computational tricks, DistilBERT is much faster than BERT while offering almost identical modeling performance.

Incorrect. DistilBERT is smaller than BERT.

Incorrect. DistilBERT is a smaller BERT architecture

Please answer all questions to proceed.
Watch Video 1

Unit 8.7

Videos