GenAI Systems Lab Open interactive version →
AI Engineering

Hamel Husain: Evals Are Everything

Hamel wrote the definitive guide to LLM evals. His core thesis: if you don't have evals, you don't have a product. A required read before shipping anything.

Who He Is

Hamel Husain is an ML engineer and independent consultant who has worked with GitHub, Airbnb, and a range of AI startups. He is best known for writing the most practical, opinionated guide to LLM evaluation that exists — and for his work on LLM fine-tuning in production, including contributions to the FastAI ecosystem.

Core Thesis

If you don't have evals, you don't have a product. Everything else — prompting, fine-tuning, RAG — is secondary to knowing whether your system works.

Key Themes

Essential Reading

ResourceFormatWhy It Matters
Your AI Product Needs EvalsBlog postThe best single piece on why evaluation is the core discipline of AI product engineering.
A Practical Guide to LLM Evalshamel.devStep-by-step: what to measure, how to set up an eval harness, LLM-as-judge pitfalls.
Fine-tuning in PracticeBlog seriesReal workflows: dataset curation, SFT, DPO, evaluation — not theoretical pipelines.
nbdev (FastAI)Open-sourceNotebook-driven development — his preferred environment for rapid ML experimentation.
hamel.devBlogOngoing: practical posts on what actually works in production ML, no hype.

What to Question

Hamel's emphasis on domain-specific fine-tuning is well-placed but sometimes understates how far a well-prompted frontier model can go without fine-tuning. His eval frameworks are opinionated — they work best in structured task settings and require more adaptation for open-ended generation tasks.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →