ReAct: The Paper That Gave LLMs the Ability to Reason and Act Together
Princeton/Google's 2022 paper interleaving reasoning traces with action steps. The foundation of every modern AI agent framework — and why it outperforms pure reasoning or pure acting alone.
Early LLM agents had a split personality. Chain-of-thought prompting could make models reason well — but the reasoning stayed inside the model's head, untethered to external reality. Action-based prompting could make models act — but without reasoning, they acted blindly, unable to recover from mistakes.
In October 2022, researchers at Princeton and Google published 'ReAct: Synergizing Reasoning and Acting in Language Models'. The insight: interleave reasoning traces and action steps in the same generation. Think, act, observe the result, think about what it means, act again. This paper is the foundation of every modern AI agent framework.
The Thought-Action-Observation loop
Task: What is the current population of the capital of France?
Thought: I need to find the capital of France, then its population.
Action: Search[capital of France]
Observation: Paris is the capital of France.
Thought: Now I need the current population of Paris.
Action: Search[current population of Paris]
Observation: The population of Paris is approximately 2.1 million as of 2024.
Thought: I have all the information needed.
Action: Finish[The current population of Paris, France's capital, is approximately 2.1 million.]
The critical difference from pure chain-of-thought: actions are grounded in external observations. The model can't hallucinate the population of Paris because it retrieved the actual value. Each Observation grounds the next Thought — breaking the confabulation cycle.
Why interleaving matters
The paper tested Act-only, Reason-only, and ReAct. ReAct outperformed both on nearly all benchmarks: HotPotQA, Fever, ALFWorld, WebShop. The reasoning trace serves two purposes: selecting the right action, and helping the model recover when an action produces unexpected results.
Common failure modes of ReAct agents
- Hallucinated observations: the model generates an Observation that isn't from an actual tool call — especially when the tool returns an error
- Reasoning loops: cycles through Thought→Action→Observation without progress, unable to recognise it's stuck
- Incorrect tool selection: reasoning is correct but the model calls the wrong tool or passes wrong parameters
- Poor error recovery: most implementations don't handle tool failures gracefully — the model sees an error and often retries the same action
ReAct in modern frameworks
- LangChain AgentExecutor: orchestrates the loop automatically, parsing tool calls from model output
- LangGraph: represents the ReAct loop as a graph with conditional edges — more control over stopping, human handoff, and error handling
- Anthropic tool use: model outputs tool_use blocks, receives tool_result blocks — Thought is implicit before the tool call
- OpenAI function calling: structured tool calls replace text Actions — reasoning still appears in o1/o3 internal traces
Simulate ReAct loops in the Agents Lab →: Step through Thought-Action-Observation traces, inject failures, and see how agents recover.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →