AI Engineering 9 min read

PEFT Methods Compared: LoRA, Prefix Tuning, Prompt Tuning, and Adapters

The four main parameter-efficient fine-tuning approaches — what each trains, when each wins, and why LoRA dominates most production use cases despite not being the oldest method.

LoRA gets most of the attention in the PEFT ecosystem — and for good reason. But it's not the only parameter-efficient fine-tuning method, and understanding what the others do (and why LoRA beat them in practice) builds a clearer mental model of what PEFT is actually optimising for.

The four main PEFT families: LoRA (low-rank adapter matrices), Prefix Tuning (prepend trainable tokens to the key-value sequence), Prompt Tuning (prepend trainable tokens to the input), and Adapter Layers (insert small bottleneck networks into transformer layers). They all share the same goal — adapt a large pretrained model efficiently — but make different tradeoffs.

LoRA (Low-Rank Adaptation)

Injects trainable low-rank matrices ΔW = BA alongside frozen weight matrices. At inference, adapters can be merged into base weights — zero inference overhead. Rank r controls capacity: higher rank captures more complex adaptations but uses more parameters.

Parameters: ~0.1–2% of base model (depending on rank and target modules)
Inference overhead: zero after merging; ~5% before merging
Works well for: style, format, domain adaptation, instruction following
Weakness: less effective at tasks requiring broad architectural changes

Adapter Layers

Insert small bottleneck networks (down-project → nonlinearity → up-project) into each transformer layer. The adapter learns a residual transformation: output = input + adapter(input). Only the adapter weights are trained.

Parameters: ~1–8% of base model
Inference overhead: permanent — the adapter layers are always active, adding latency
Works well for: multi-task learning (one adapter per task, same base model)
Weakness: the permanent inference overhead makes it less attractive for production single-task models

Prefix Tuning

Prepends a sequence of trainable 'virtual tokens' to the key and value tensors at every attention layer. These prefix vectors are never part of the input text — they exist only inside the attention computation, acting as soft prompts that influence every layer's attention.

Parameters: prefix length × model dimension × num layers × 2 (K and V) — typically ~0.1% of base model
Inference overhead: small — each attention computation includes the prefix tokens
Works well for: generation tasks, summarisation, translation
Weakness: harder to optimise than LoRA; can be unstable on small models; prefix length is a difficult hyperparameter

Prompt Tuning

Simplest of the PEFT methods. Prepend a small number of trainable continuous tokens to the input embedding. Only the token embeddings are trainable — the entire model is frozen. At very large model scales (11B+), prompt tuning was shown to approach full fine-tuning quality.

Parameters: prompt length × embedding dimension — typically <0.01% of base model
Inference overhead: the prompt tokens add to sequence length
Works well for: classification tasks on very large models
Weakness: significantly underperforms LoRA on smaller models; unstable optimisation

Why LoRA dominates in practice

Method	Params	Inference Overhead	Stability	Production Use
LoRA	0.1–2%	Zero (after merge)	High	Dominant
QLoRA	0.1–2%	Zero (after merge)	High	Dominant (low VRAM)
Adapters	1–8%	Permanent (~10ms)	High	Multi-task setups
Prefix Tuning	~0.1%	Small	Medium	Niche
Prompt Tuning	<0.01%	Minimal	Low on small models	Mostly replaced by LoRA

LoRA wins on the combination of: zero inference overhead after merging, stable optimisation across model sizes, flexibility in rank selection, and the ability to train on top of quantised models (QLoRA). No other PEFT method checks all four boxes.

Compare PEFT methods →: See how LoRA, adapters, and other PEFT approaches perform across task types.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →