AI Engineering 10 min read

Fine-Tuning vs. RAG vs. Prompt Engineering: When to Use What

The decision framework every AI engineer needs. Cost, latency, data requirements, update frequency, and failure modes for each approach.

When someone asks "should we fine-tune or use RAG?", the honest answer is: it depends on why your model is failing. Fine-tuning and RAG solve completely different problems. Using the wrong one is expensive and doesn't fix the root cause.

The decision framework

If the model fails because...	Use
It doesn't know about recent or private documents	RAG
It doesn't know how to respond in your specific format/tone	Fine-tuning
It hallucinates on domain-specific facts	RAG (add grounding)
It can't follow your multi-step task format reliably	Fine-tuning
It needs to cite sources	RAG
Its few-shot prompt is too expensive to send every call	Fine-tuning (distil the prompt)
It fails on both knowledge and behaviour	RAG + fine-tuning

RAG: retrieval-augmented generation

RAG keeps the base model frozen and injects relevant information at inference time. It's the right choice for private/proprietary knowledge, frequently updated information, and any use case where you need to cite sources.

No training data required — you can start with existing documents
Knowledge updates instantly — re-index the new document, done
Fully auditable — you can trace every answer to its source
Scales to millions of documents with a good vector database

Fine-tuning: adjusting model behaviour

Fine-tuning updates the model's weights to change how it responds, not what it knows. LoRA (Low-Rank Adaptation) and QLoRA are the dominant techniques — they add a small set of trainable parameters to a frozen base model, requiring far less compute and data than full fine-tuning.

You can fine-tune GPT-4o-mini for roughly $1–10 per 1,000 examples. You need ~100–500 high-quality examples for meaningful behaviour change. The output is a model that costs the same to run but behaves differently.

Prompt engineering first — always

Before committing to either RAG or fine-tuning, exhaust prompt engineering. Most "the model doesn't do X" problems can be solved with better prompts, few-shot examples, or chain-of-thought. RAG and fine-tuning have real costs; prompting is free.

Approach	Time to value	Cost	Maintenance
Prompt engineering	Hours	Free	Low
RAG	Days	Index + infra	Medium — keep index fresh
Fine-tuning (LoRA)	Days	$10–$1,000	High — retrain on data changes
Full fine-tune	Weeks	$1K–$100K	Very high

Compare RAG vs. baseline in RAG Lab →: See the direct quality difference between prompting alone and RAG-augmented generation on the same questions.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →