Lightning AI Studios: Never set up a local environment again →

← Back to glossary

LoRA

LoRA achieves parameter reduction in large language models by learning rank-decomposition matrices in conjunction with freezing the original weights. This significantly diminishes storage needs for task-specific adaptations, facilitates efficient task-switching during deployment without introducing inference latency, and exhibits superior performance compared to other adaptation methods like adapter, prefix-tuning, and fine-tuning.

Related content

Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)
Accelerating Large Language Models with Mixed-Precision Techniques
Finetuning Falcon LLMs More Efficiently With LoRA and Adapters
The NeurIPS 2023 LLM Efficiency Challenge Starter Guide