AI Engineering 8 min read

LIMA: Why 1,000 Examples Beat Millions for Instruction Tuning

Meta AI's 2023 paper showing that 1,000 carefully chosen examples produce alignment as good as RLHF. The insight that data quality beats quantity — and what it means for fine-tuning.

The conventional wisdom by early 2023: instruction tuning required scale. More human-preference data, more RLHF iterations. The gap between base models and instruction-following models seemed to be a data volume problem.

In May 2023, Chunting Zhou and colleagues at Meta AI published 'LIMA: Less Is More for Alignment'. Fine-tune LLaMA 65B on 1,000 carefully selected examples. No RLHF. No preference data. Result: a model that human evaluators preferred over GPT-4 on 43% of prompts, and equally or better on 58%.

LIMA's central claim: almost all of a model's knowledge and capabilities are acquired during pretraining. Alignment — making it helpful and well-formatted — is largely about learning a style of interaction, not new knowledge. You can teach that style with 1,000 examples.

What made the 1,000 examples special

Curated from Stack Exchange, wikiHow, Reddit, and handwritten examples — selected for diversity and quality, not volume
No near-duplicates: each example unique in topic, format, and instruction type
Format consistency: all responses followed the same interaction style — direct, clear, helpful
In an ablation: adding 2,000 random examples hurt performance. Volume actively diluted quality.

Implications for fine-tuning in production

Start with 500–1,000 high-quality (instruction, response) pairs before scaling up
Spend labelling budget on quality checks, not quantity
Format consistency matters: training examples that vary in tone and verbosity teach the model inconsistency
RLHF may not be necessary for domain-specific instruction following at the right quality bar

LIMA's results are specific to base models with strong pretraining (LLaMA 65B). Smaller models with weaker pretraining need more alignment data. LIMA means the ratio of pretraining quality to alignment data needed is higher than was assumed — not that 1,000 examples always works.

Compare instruction-tuned models →: See how models with different fine-tuning approaches handle the same prompts.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →