Full Fine-Tuning vs. PEFT vs. Prompting: The Decision Framework
A practical decision tree for choosing between full fine-tuning, LoRA/QLoRA, and prompt engineering — based on data size, latency requirements, update frequency, and budget.
You've decided fine-tuning is the right approach. Now comes the second decision: which kind of fine-tuning? Full fine-tuning updates every parameter. Parameter-efficient fine-tuning (PEFT) methods like LoRA update a tiny fraction of parameters. And sometimes, prompt engineering with a few-shot example set is close enough that no fine-tuning is warranted at all.
These aren't just options on a scale of 'more compute = better results'. They have different cost profiles, update characteristics, inference tradeoffs, and failure modes. Choosing wrong means wasted GPU hours at best and a model that performs worse than your baseline at worst.
The three approaches compared
| Approach | What Gets Updated | VRAM (70B model) | Quality Ceiling | Best For |
|---|---|---|---|---|
| Prompt engineering | Nothing — inference only | None | ~80% of fine-tune quality for most tasks | Format, tone, structured output, few-shot tasks |
| LoRA / QLoRA (PEFT) | ~0.5–2% of params (adapter matrices) | 12–48GB with QLoRA | Close to full FT on most tasks | Domain adaptation, style, task specialisation |
| Full fine-tuning | 100% of params | 500GB+ (fp16) | Highest possible | Deep domain embedding, distillation, novel capabilities |
When to use prompt engineering (not fine-tuning)
- You need results in days, not weeks
- Your dataset is small (<500 quality examples)
- Your task format is well-defined and expressible in a system prompt
- You need to update behaviour frequently without retraining
- You're testing a product hypothesis before committing engineering time
A well-engineered prompt with 10–20 few-shot examples typically achieves 80–90% of the quality a fine-tuned model would achieve on format and style tasks. Measure the gap before deciding fine-tuning is worth the investment.
When to use LoRA / QLoRA
- You have 500–50K high-quality training examples
- You need consistent behaviour that a system prompt can't reliably enforce
- You're adapting to a specific domain, persona, or task format
- You want to run fine-tuning on a single GPU or small cluster
- You need multiple specialised models from the same base (LoRA adapters can be swapped at inference time)
When to use full fine-tuning
- You have 50K+ high-quality examples and a specific quality target that LoRA doesn't reach
- You're doing knowledge distillation from a larger teacher model
- You're embedding deep domain knowledge (medical, legal, scientific) that requires broad weight updates
- You have the infrastructure to run multi-GPU training and can absorb the compute cost
Full fine-tuning a 70B model requires 500GB+ of GPU VRAM in fp16 — roughly 8x A100-80GB cards just to fit the model + gradients + optimiser state. Unless you have this infrastructure or a compelling reason LoRA won't work, LoRA/QLoRA is almost always the right starting point.
The decision tree
Do you have a strong prompt baseline?
No → Build one first. Measure the gap.
Yes → Is the gap significant and consistent across your eval set?
No → Prompt engineering is sufficient.
Yes → Do you have 500+ high-quality training examples?
No → Collect more data first.
Yes → Can LoRA reach your quality target? (run small experiment)
Yes → Use LoRA / QLoRA
No → Do you have multi-GPU infrastructure?
Yes → Full fine-tuning
No → Get infrastructure or revisit quality target
Compare fine-tuning approaches →: See how full fine-tuning, LoRA, and prompting perform across different task types.
- LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2022)
- QLoRA: Efficient Finetuning of Quantized LLMs (Dettmers et al., 2023)
- The Power of Scale for Parameter-Efficient Prompt Tuning (Lester et al., 2021)
- Chip Huyen: Fine-tuning LLMs
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →