AI Engineering 9 min read

LoRA: How Low-Rank Adaptation Made Fine-Tuning Accessible to Everyone

Microsoft's 2022 paper introducing LoRA. How injecting trainable low-rank matrices into frozen weights reduced fine-tuning VRAM by 10× — and became the standard way to adapt LLMs.

Full fine-tuning a language model means updating every parameter. For GPT-3's 175 billion parameters, storing gradient updates requires roughly 700GB of optimiser state in mixed precision. Fine-tuning frontier models was impossible for most teams.

In 2022, Edward Hu and colleagues at Microsoft published 'LoRA: Low-Rank Adaptation of Large Language Models'. The proposal: freeze all original model weights and inject small trainable matrices alongside them. Fine-tune a 175B model with the same VRAM as training a 35M model, with comparable task performance. LoRA is now the standard approach for fine-tuning LLMs.

The low-rank hypothesis

Fine-tuning updates have a low intrinsic dimensionality — most of the weight change can be expressed as a product of two much smaller matrices.

Original weight: W ∈ R^(d×k)  [frozen during training]
LoRA adds:       ΔW = B × A
  A ∈ R^(r×k)  [trainable, initialised randomly]
  B ∈ R^(d×r)  [trainable, initialised to zero]
  r << d, k    [rank, typically 4–64]

Forward pass: h = Wx + BAx

Full fine-tuning: 4096 × 4096 = 16.7M params per matrix
LoRA (rank=8):   (4096×8) + (8×4096) = 65.5K  →  255× fewer params

You don't need to update all 175B parameters — you need to update the right 1% of them. Fine-tuning updates are low-rank, and low-rank matrix products capture most of the adaptation needed for a new task or domain.

QLoRA: fine-tuning in 4-bit

Tim Dettmers' 2023 QLoRA combined 4-bit quantisation of the base model with LoRA adapters. Fine-tuning LLaMA-65B became possible on a single 48GB GPU — previously impossible even with LoRA alone. Results were close to full fine-tuning at a fraction of the memory cost.

Configuration guide

Rank r: typically 8–64. Higher rank = more expressivity but more parameters. Start at r=8 for domain adaptation, r=64 for complex tasks.
Alpha: typically 2× the rank. Controls the scaling of the LoRA update relative to the original weights.
Target modules: apply to all attention projections (Q, K, V, O). Include FFN layers for better results.
Dropout: 0.05–0.1 to prevent overfitting on small datasets.

LoRA's limitations

Cannot teach genuinely new capabilities requiring architectural changes — best for style, format, and domain knowledge adaptation
Catastrophic forgetting: mitigated but not eliminated when fine-tuning on narrow tasks
Rank selection: requires experimentation — too low loses information, too high adds overhead without benefit

Explore fine-tuning approaches →: Compare LoRA, full fine-tuning, and prompting for different adaptation tasks.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →