8.7 A Large Language Model for Classification

Slides

Part 1: Bidirectional Pretraining with BERT

References

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Code

Parts 2 & 3: unit08-large-language-models/8.7-distilbert-finetuning/

What we covered in this video lecture

In this lecture, we are finetuning a DistilBERT model on the movie review classification task. DistilBERT reduces the size of a BERT model by 40%. At the same time, it retains 97% of BERT’s language understanding capabilities and is 60% faster.

We are exploring both approaches: finetuning only the last layer and finetuning all layers, and compare the predictive and computational performances of these two paradigms.

This code can be used as a template to adapt other pretrained large language models to various text classification tasks.

Additional resources if you want to learn more

In this lecture, we focused on finetuning transformers the conventional way. Recently, researchers begun to develop more parameter efficient finetuning methods to make finetuning more computationally affordable. If you are interested in learning more about these parameter-efficient finetuning methods, check out my articles on

Moreover, you might be interested in checking out our Lit-LLaMA repository, which contains an implementation of the popular LLaMA language model based on nanoGPT.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

8.7 A Large Language Model for Classification

Slides

References

Quiz: 8.7 A Large Language Model for Classification (Part 1)

Quiz: 8.7 A Large Language Model for Classification (Part 2)

Watch Video 1 Mark complete and go to Unit 8 Exercises →

Videos

Follow along in a Lightning Studio

DL Fundamentals 8: Large Language Models