LoRA: How Low-Rank Adaptation Made Fine-Tuning Accessible to Everyone
Microsoft's 2022 paper introducing LoRA. How injecting trainable low-rank matrices into frozen weights reduced fine-tuning VRAM by 10× — and became the standard way to adapt LLMs.
Full fine-tuning a language model means updating every parameter. For GPT-3's 175 billion parameters, storing gradient updates requires roughly 700GB of optimiser state in mixed precision. Fine-tuning frontier models was impossible for most teams.
In 2022, Edward Hu and colleagues at Microsoft published 'LoRA: Low-Rank Adaptation of Large Language Models'. The proposal: freeze all original model weights and inject small trainable matrices alongside them. Fine-tune a 175B model with the same VRAM as training a 35M model, with comparable task performance. LoRA is now the standard approach for fine-tuning LLMs.
The low-rank hypothesis
Fine-tuning updates have a low intrinsic dimensionality — most of the weight change can be expressed as a product of two much smaller matrices.
Original weight: W ∈ R^(d×k) [frozen during training]
LoRA adds: ΔW = B × A
A ∈ R^(r×k) [trainable, initialised randomly]
B ∈ R^(d×r) [trainable, initialised to zero]
r << d, k [rank, typically 4–64]
Forward pass: h = Wx + BAx
Full fine-tuning: 4096 × 4096 = 16.7M params per matrix
LoRA (rank=8): (4096×8) + (8×4096) = 65.5K → 255× fewer params
You don't need to update all 175B parameters — you need to update the right 1% of them. Fine-tuning updates are low-rank, and low-rank matrix products capture most of the adaptation needed for a new task or domain.
QLoRA: fine-tuning in 4-bit
Tim Dettmers' 2023 QLoRA combined 4-bit quantisation of the base model with LoRA adapters. Fine-tuning LLaMA-65B became possible on a single 48GB GPU — previously impossible even with LoRA alone. Results were close to full fine-tuning at a fraction of the memory cost.
Configuration guide
- Rank r: typically 8–64. Higher rank = more expressivity but more parameters. Start at r=8 for domain adaptation, r=64 for complex tasks.
- Alpha: typically 2× the rank. Controls the scaling of the LoRA update relative to the original weights.
- Target modules: apply to all attention projections (Q, K, V, O). Include FFN layers for better results.
- Dropout: 0.05–0.1 to prevent overfitting on small datasets.
LoRA's limitations
- Cannot teach genuinely new capabilities requiring architectural changes — best for style, format, and domain knowledge adaptation
- Catastrophic forgetting: mitigated but not eliminated when fine-tuning on narrow tasks
- Rank selection: requires experimentation — too low loses information, too high adds overhead without benefit
Explore fine-tuning approaches →: Compare LoRA, full fine-tuning, and prompting for different adaptation tasks.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →