8.7 A Large Language Model for Classification
What we covered in this video lecture
In this lecture, we are finetuning a DistilBERT model on the movie review classification task. DistilBERT reduces the size of a BERT model by 40%. At the same time, it retains 97% of BERT’s language understanding capabilities and is 60% faster.
We are exploring both approaches: finetuning only the last layer and finetuning all layers, and compare the predictive and computational performances of these two paradigms.
This code can be used as a template to adapt other pretrained large language models to various text classification tasks.
Additional resources if you want to learn more
In this lecture, we focused on finetuning transformers the conventional way. Recently, researchers begun to develop more parameter efficient finetuning methods to make finetuning more computationally affordable. If you are interested in learning more about these parameter-efficient finetuning methods, check out my articles on
- Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters
- and Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA).
Moreover, you might be interested in checking out our Lit-LLaMA repository, which contains an implementation of the popular LLaMA language model based on nanoGPT.
Log in or create a free Lightning.ai account to access:
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits