GenAI Systems Lab Open interactive version →
AI Engineering 10 min read

Fine-Tuning vs. RAG vs. Prompt Engineering: When to Use What

The decision framework every AI engineer needs. Cost, latency, data requirements, update frequency, and failure modes for each approach.

When someone asks "should we fine-tune or use RAG?", the honest answer is: it depends on why your model is failing. Fine-tuning and RAG solve completely different problems. Using the wrong one is expensive and doesn't fix the root cause.

The decision framework

If the model fails because...Use
It doesn't know about recent or private documentsRAG
It doesn't know how to respond in your specific format/toneFine-tuning
It hallucinates on domain-specific factsRAG (add grounding)
It can't follow your multi-step task format reliablyFine-tuning
It needs to cite sourcesRAG
Its few-shot prompt is too expensive to send every callFine-tuning (distil the prompt)
It fails on both knowledge and behaviourRAG + fine-tuning

RAG: retrieval-augmented generation

RAG keeps the base model frozen and injects relevant information at inference time. It's the right choice for private/proprietary knowledge, frequently updated information, and any use case where you need to cite sources.

Fine-tuning: adjusting model behaviour

Fine-tuning updates the model's weights to change how it responds, not what it knows. LoRA (Low-Rank Adaptation) and QLoRA are the dominant techniques — they add a small set of trainable parameters to a frozen base model, requiring far less compute and data than full fine-tuning.

You can fine-tune GPT-4o-mini for roughly $1–10 per 1,000 examples. You need ~100–500 high-quality examples for meaningful behaviour change. The output is a model that costs the same to run but behaves differently.

Prompt engineering first — always

Before committing to either RAG or fine-tuning, exhaust prompt engineering. Most "the model doesn't do X" problems can be solved with better prompts, few-shot examples, or chain-of-thought. RAG and fine-tuning have real costs; prompting is free.

ApproachTime to valueCostMaintenance
Prompt engineeringHoursFreeLow
RAGDaysIndex + infraMedium — keep index fresh
Fine-tuning (LoRA)Days$10–$1,000High — retrain on data changes
Full fine-tuneWeeks$1K–$100KVery high

Compare RAG vs. baseline in RAG Lab →: See the direct quality difference between prompting alone and RAG-augmented generation on the same questions.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →