8.6 Large Language Models

Slides

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018)
Improving Language Understanding by Generative Pre-Training (2018) (GPT-1 paper)
Language Models are Unsupervised Multitask Learners (2019) (GPT-2 paper)
Language Models are Few-Shot Learners (2020) (GPT-3 paper)

What we covered in this video lecture

The two main ingredients of success of transformers are (1) self-attention and (2) self-supervised pretraining (we covered in Unit 7.)

In this lecture, we are discussing the self-supervised pretraining objectives for BERT and GPT. We are also discussing how we can adapt pretrained transformers for new downstream tasks, for example, sentiment classification.

Additional resources if you want to learn more

If you are interested in additional resources about large language models, I compiled a list of influential papers in my article Understanding Large Language Models — A Cross-Section of the Most Relevant Literature To Get Up to Speed.

Log in or create a free Lightning.ai account to access:

Quizzes
Completion badges
Progress tracking
Additional downloadable content
Additional AI education resources
Notifications when new units are released
Free cloud computing credits

8.6 Large Language Models

Slides

References

Quiz: 8.6 Large Language Models (Part 1)

Quiz: 8.6 Large Language Models (Part 2)

Watch Video 1 Mark complete and go to Unit 8.7 →

Videos

Follow along in a Lightning Studio

DL Fundamentals 8: Large Language Models