Lightning AI Studios: Never set up a local environment again →

Log in or create a free Lightning.ai account to track your progress and access additional course materials  

8.6 Large Language Models

References

What we covered in this video lecture

The two main ingredients of success of transformers are (1) self-attention and (2) self-supervised pretraining (we covered in Unit 7.)

In this lecture, we are discussing the self-supervised pretraining objectives for BERT and GPT. We are also discussing how we can adapt pretrained transformers for new downstream tasks, for example, sentiment classification.

Additional resources if you want to learn more

If you are interested in additional resources about large language models, I compiled a list of influential papers in my article Understanding Large Language Models — A Cross-Section of the Most Relevant Literature To Get Up to Speed.

Log in or create a free Lightning.ai account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.6 Large Language Models (Part 1)

How is BERT pretrained?

Incorrect. This is a GPT pretraining task.

Correct. This is part of the BERT pretraining task.

Correct. This is part of the BERT pretraining task related to the masked word prediction.

Correct. BERT also classifies whether the order of 2 concatenated sentences is correct.

Please answer all questions to proceed.

Quiz: 8.6 Large Language Models (Part 2)

Which of the following are valid finetuning approaches?

Correct. This is the most expensive approach but often results in the best modeling performance.

Correct. This is one of the cheapest approaches to finetune a model.

Incorrect. Usually, only updating the first two layers without adopting the output layers would work very poorly in practice because this would change the inputs to the output layers, and those output layers have never seen such inputs during pretraining.

Correct. This is one way we can finetune the model.

Please answer all questions to proceed.
Watch Video 1

Unit 8.6

Videos