Fine-Tuning vs. RAG vs. Prompt Engineering: When to Use What
The decision framework every AI engineer needs. Cost, latency, data requirements, update frequency, and failure modes for each approach.
When someone asks "should we fine-tune or use RAG?", the honest answer is: it depends on why your model is failing. Fine-tuning and RAG solve completely different problems. Using the wrong one is expensive and doesn't fix the root cause.
The decision framework
| If the model fails because... | Use |
|---|---|
| It doesn't know about recent or private documents | RAG |
| It doesn't know how to respond in your specific format/tone | Fine-tuning |
| It hallucinates on domain-specific facts | RAG (add grounding) |
| It can't follow your multi-step task format reliably | Fine-tuning |
| It needs to cite sources | RAG |
| Its few-shot prompt is too expensive to send every call | Fine-tuning (distil the prompt) |
| It fails on both knowledge and behaviour | RAG + fine-tuning |
RAG: retrieval-augmented generation
RAG keeps the base model frozen and injects relevant information at inference time. It's the right choice for private/proprietary knowledge, frequently updated information, and any use case where you need to cite sources.
- No training data required — you can start with existing documents
- Knowledge updates instantly — re-index the new document, done
- Fully auditable — you can trace every answer to its source
- Scales to millions of documents with a good vector database
Fine-tuning: adjusting model behaviour
Fine-tuning updates the model's weights to change how it responds, not what it knows. LoRA (Low-Rank Adaptation) and QLoRA are the dominant techniques — they add a small set of trainable parameters to a frozen base model, requiring far less compute and data than full fine-tuning.
You can fine-tune GPT-4o-mini for roughly $1–10 per 1,000 examples. You need ~100–500 high-quality examples for meaningful behaviour change. The output is a model that costs the same to run but behaves differently.
Prompt engineering first — always
Before committing to either RAG or fine-tuning, exhaust prompt engineering. Most "the model doesn't do X" problems can be solved with better prompts, few-shot examples, or chain-of-thought. RAG and fine-tuning have real costs; prompting is free.
| Approach | Time to value | Cost | Maintenance |
|---|---|---|---|
| Prompt engineering | Hours | Free | Low |
| RAG | Days | Index + infra | Medium — keep index fresh |
| Fine-tuning (LoRA) | Days | $10–$1,000 | High — retrain on data changes |
| Full fine-tune | Weeks | $1K–$100K | Very high |
Compare RAG vs. baseline in RAG Lab →: See the direct quality difference between prompting alone and RAG-augmented generation on the same questions.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →