8.5 Understanding Self-Attention
- Attention Is All You Need (2018) — the original transformer paper
What we covered in this video lecture
To understand large language transformers, it is essential to understand self-attention, which is the underlying mechanism that powers these models: self-attention can be understood as a way to create context-aware text embedding vectors.
In this lecture, we explain self-attention from the ground up. We are starting with a simple parameter-free version of self-attention to explain the underlying principles. Then, we cover the parameterized self-attention mechanism used in transformers: self-attention with learnable weights.
Additional resources if you want to learn more
This lecture introduced the attention mechanism with conceptual illustration. If you prefer a coding-based approach, also check out my article Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch.
Log in or create a free Lightning.ai account to access:
- Completion badges
- Progress tracking
- Additional downloadable content
- Additional AI education resources
- Notifications when new units are released
- Free cloud computing credits