Lightning AI Studios: Never set up a local environment again →

Log in or create a free account to track your progress and access additional course materials  

8.4 From RNNs to the Transformer Architecture


What we covered in this video lecture

For many years, recurrent neural networks (RNNs) have been the main deep learning approach for text data. However, a few years ago, large language transformers began revolutionizing natural language processing. In this lecture, you will learn what transformers are and where the idea for the original attention mechanism in transformers came from.

Additional resources if you want to learn more

While it’s not essential, I highly recommend browsing through the recommend to browse through the RNN-attention paper (Neural Machine Translation by Jointly Learning to Align and Translate) as well as the original transformer paper (Attention Is All You Need).

Log in or create a free account to access:

  • Quizzes
  • Completion badges
  • Progress tracking
  • Additional downloadable content
  • Additional AI education resources
  • Notifications when new units are released
  • Free cloud computing credits

Quiz: 8.4 From RNNs to the Transformer Architecture (Part 1)

Which of these tasks are suitable tasks for large language transformers?

Correct. A popular example is AlphaFold.

Correct. A popular example is Google Translate.

Correct. A popular example is ChatGPT.

Please answer all questions to proceed.

Quiz: 8.4 From RNNs to the Transformer Architecture (Part 2)

Thinking back to the bag-of-words model, why is this not a common approach for machine translation?

Incorrect. The bag-of-words model can handle large vocabularies by representing them as sparse vectors.

Correct. The bag-of-words model represents text as a set of unique words and their frequencies. It cannot encode grammar or structure.

Incorrect. It is generally not computationally expensive, as it represents text as sparse vectors, reducing the computational complexity.

Incorrect. Bag-of-words can be applied to both short and long texts, but its main limitation is the lack of consideration for word order and structure, not the length of the text.

Please answer all questions to proceed.

Quiz: 8.4 From RNNs to the Transformer Architecture (Part 3)

The attention mechanism for RNNs was introduced to address which limitation of the original sequence-to-sequence models?

The attention mechanism allows the model to focus on different parts of the input sequence at each decoding step, making it easier to capture and retain relevant information from longer input sequences.

Incorrect. It was not specifically designed to address overfitting on small datasets. However, using attention mechanisms might help improve the model’s performance in some cases by allowing it to focus on relevant input information.

Incorrect. In fact, attention mechanisms increase the computational complexity of the model due to the additional calculations involved in computing attention scores.

Incorrect. The attention mechanism’s main purpose is to address the limitation of fixed-size context vectors in sequence-to-sequence models, not capturing local word order information.

Please answer all questions to proceed.
Watch Video 1

Unit 8.4