GenAI Systems Lab Open interactive version →
AI Engineering 8 min read

LIMA: Why 1,000 Examples Beat Millions for Instruction Tuning

Meta AI's 2023 paper showing that 1,000 carefully chosen examples produce alignment as good as RLHF. The insight that data quality beats quantity — and what it means for fine-tuning.

The conventional wisdom by early 2023: instruction tuning required scale. More human-preference data, more RLHF iterations. The gap between base models and instruction-following models seemed to be a data volume problem.

In May 2023, Chunting Zhou and colleagues at Meta AI published 'LIMA: Less Is More for Alignment'. Fine-tune LLaMA 65B on 1,000 carefully selected examples. No RLHF. No preference data. Result: a model that human evaluators preferred over GPT-4 on 43% of prompts, and equally or better on 58%.

LIMA's central claim: almost all of a model's knowledge and capabilities are acquired during pretraining. Alignment — making it helpful and well-formatted — is largely about learning a style of interaction, not new knowledge. You can teach that style with 1,000 examples.

What made the 1,000 examples special

Implications for fine-tuning in production

LIMA's results are specific to base models with strong pretraining (LLaMA 65B). Smaller models with weaker pretraining need more alignment data. LIMA means the ratio of pretraining quality to alignment data needed is higher than was assumed — not that 1,000 examples always works.

Compare instruction-tuned models →: See how models with different fine-tuning approaches handle the same prompts.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →