GenAI Systems Lab Open interactive version →
AI Engineering 10 min read

Full Fine-Tuning vs. PEFT vs. Prompting: The Decision Framework

A practical decision tree for choosing between full fine-tuning, LoRA/QLoRA, and prompt engineering — based on data size, latency requirements, update frequency, and budget.

You've decided fine-tuning is the right approach. Now comes the second decision: which kind of fine-tuning? Full fine-tuning updates every parameter. Parameter-efficient fine-tuning (PEFT) methods like LoRA update a tiny fraction of parameters. And sometimes, prompt engineering with a few-shot example set is close enough that no fine-tuning is warranted at all.

These aren't just options on a scale of 'more compute = better results'. They have different cost profiles, update characteristics, inference tradeoffs, and failure modes. Choosing wrong means wasted GPU hours at best and a model that performs worse than your baseline at worst.

The three approaches compared

ApproachWhat Gets UpdatedVRAM (70B model)Quality CeilingBest For
Prompt engineeringNothing — inference onlyNone~80% of fine-tune quality for most tasksFormat, tone, structured output, few-shot tasks
LoRA / QLoRA (PEFT)~0.5–2% of params (adapter matrices)12–48GB with QLoRAClose to full FT on most tasksDomain adaptation, style, task specialisation
Full fine-tuning100% of params500GB+ (fp16)Highest possibleDeep domain embedding, distillation, novel capabilities

When to use prompt engineering (not fine-tuning)

A well-engineered prompt with 10–20 few-shot examples typically achieves 80–90% of the quality a fine-tuned model would achieve on format and style tasks. Measure the gap before deciding fine-tuning is worth the investment.

When to use LoRA / QLoRA

When to use full fine-tuning

Full fine-tuning a 70B model requires 500GB+ of GPU VRAM in fp16 — roughly 8x A100-80GB cards just to fit the model + gradients + optimiser state. Unless you have this infrastructure or a compelling reason LoRA won't work, LoRA/QLoRA is almost always the right starting point.

The decision tree

Do you have a strong prompt baseline?
  No → Build one first. Measure the gap.
  Yes → Is the gap significant and consistent across your eval set?
    No → Prompt engineering is sufficient.
    Yes → Do you have 500+ high-quality training examples?
      No → Collect more data first.
      Yes → Can LoRA reach your quality target? (run small experiment)
        Yes → Use LoRA / QLoRA
        No → Do you have multi-GPU infrastructure?
          Yes → Full fine-tuning
          No → Get infrastructure or revisit quality target

Compare fine-tuning approaches →: See how full fine-tuning, LoRA, and prompting perform across different task types.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →