GenAI Systems Lab Open interactive version →
AI Engineering

François Chollet: What LLMs Can't Do

Chollet built ARC-AGI to measure what LLMs systematically fail at. His 'On the Measure of Intelligence' paper is the sharpest critique of benchmark-based AI progress claims.

Who He Is

François Chollet created Keras and spent a decade at Google Brain. He is best known in the AI community for building the ARC-AGI benchmark — the most consequential challenge dataset for measuring genuine reasoning in AI systems — and for his paper 'On the Measure of Intelligence,' which remains the sharpest philosophical framework for thinking about what AI systems actually do.

Core Thesis

LLMs are extremely sophisticated interpolation engines. They are not — and cannot become — general intelligence through scaling alone. Measuring progress requires measuring generalisation to truly novel tasks.

Key Themes

Essential Reading

ResourceFormatWhy It Matters
On the Measure of Intelligence (2019)arXiv paperThe philosophical foundation: defines intelligence as skill-acquisition efficiency, not task performance.
ARC-AGI benchmarkGitHub / arcprize.orgThe practical test of the thesis — 400 tasks humans solve easily, most AI systems still fail.
ARC Prize 2024 resultsBlog postWhere the frontier stands: o3 with huge compute solved 88% — at $1000+/task. What that means.
The implausibility of intelligence explosionBlog postChollet's case against recursive self-improvement narratives.
francois.chollet.workBlogOngoing reflections on AI capabilities and limitations.

What to Question

Chollet's critique is the most rigorous available — but ARC-AGI measures a specific type of generalisation (visual pattern induction). Many practical AI applications don't need human-level generalisation; they need reliable performance on a narrow distribution. His framework is essential for understanding limitations but can understate practical utility.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →