Lightning AI Studios: Never set up a local environment again →

← Back to glossary

Transformer

The Transformer is a neural network architecture introduced in the “Attention Is All You Need” paper, widely used in large language models. It replaces traditional recurrent neural networks (RNNs) with attention mechanisms and self-attention layers for capturing dependencies and improving parallel processing in sequential data tasks.

Related content

Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch
The NeurIPS 2023 LLM Efficiency Challenge Starter Guide