GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

PEFT Methods Compared: LoRA, Prefix Tuning, Prompt Tuning, and Adapters

The four main parameter-efficient fine-tuning approaches — what each trains, when each wins, and why LoRA dominates most production use cases despite not being the oldest method.

LoRA gets most of the attention in the PEFT ecosystem — and for good reason. But it's not the only parameter-efficient fine-tuning method, and understanding what the others do (and why LoRA beat them in practice) builds a clearer mental model of what PEFT is actually optimising for.

The four main PEFT families: LoRA (low-rank adapter matrices), Prefix Tuning (prepend trainable tokens to the key-value sequence), Prompt Tuning (prepend trainable tokens to the input), and Adapter Layers (insert small bottleneck networks into transformer layers). They all share the same goal — adapt a large pretrained model efficiently — but make different tradeoffs.

LoRA (Low-Rank Adaptation)

Injects trainable low-rank matrices ΔW = BA alongside frozen weight matrices. At inference, adapters can be merged into base weights — zero inference overhead. Rank r controls capacity: higher rank captures more complex adaptations but uses more parameters.

Adapter Layers

Insert small bottleneck networks (down-project → nonlinearity → up-project) into each transformer layer. The adapter learns a residual transformation: output = input + adapter(input). Only the adapter weights are trained.

Prefix Tuning

Prepends a sequence of trainable 'virtual tokens' to the key and value tensors at every attention layer. These prefix vectors are never part of the input text — they exist only inside the attention computation, acting as soft prompts that influence every layer's attention.

Prompt Tuning

Simplest of the PEFT methods. Prepend a small number of trainable continuous tokens to the input embedding. Only the token embeddings are trainable — the entire model is frozen. At very large model scales (11B+), prompt tuning was shown to approach full fine-tuning quality.

Why LoRA dominates in practice

MethodParamsInference OverheadStabilityProduction Use
LoRA0.1–2%Zero (after merge)HighDominant
QLoRA0.1–2%Zero (after merge)HighDominant (low VRAM)
Adapters1–8%Permanent (~10ms)HighMulti-task setups
Prefix Tuning~0.1%SmallMediumNiche
Prompt Tuning<0.01%MinimalLow on small modelsMostly replaced by LoRA

LoRA wins on the combination of: zero inference overhead after merging, stable optimisation across model sizes, flexibility in rank selection, and the ability to train on top of quantised models (QLoRA). No other PEFT method checks all four boxes.

Compare PEFT methods →: See how LoRA, adapters, and other PEFT approaches perform across task types.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →