How to Answer 'How Do Agents Work?' in a Technical Screen
ReAct, tool use, memory, multi-agent orchestration — what a senior interviewer wants to hear, what separates a junior answer from a strong one, and a worked example of walking through an agent architecture under time pressure.
Technical screens for ML engineering roles increasingly include agent questions — 'explain how you'd build an agent to do X' or 'how does the ReAct pattern work?' Most candidates know the buzzwords. Senior interviewers test whether you understand the failure modes.
The baseline answer most candidates give
'An agent is an LLM that can use tools. It reasons about what to do, calls a tool, gets back a result, and uses that result in the next step.' This is correct but shallow. It won't differentiate you.
The ReAct loop — show you know the mechanism
ReAct (Yao et al., 2022) is the foundational agent pattern. The loop is: Thought → Action → Observation, repeated until the agent produces a final answer.
- Thought: the model's explicit reasoning step, written in the prompt. 'I need to find the current stock price of AAPL.'
- Action: a structured tool call. Tool name + arguments. 'search_web(query="AAPL stock price")'
- Observation: the tool's return value, appended to the context. 'AAPL: $182.45 as of 14:32 ET'
- The loop repeats until the model produces a final answer rather than another action.
The key insight in ReAct is that the Thought step externalizes the model's reasoning into the context window, where it can be inspected, traced, and used in subsequent steps. This is what makes agents debuggable.
Tool design — where most candidates are weak
Good interviewers probe tool design. The questions to answer for each tool: what are its side effects? Is it idempotent? What's the error contract? How does the agent recover from tool failure?
- Idempotency: GET operations (search, read) are safe to retry. POST/PUT/DELETE operations need confirmation before retries.
- Tool descriptions matter: the agent uses the tool description to decide when to call it. Vague descriptions → wrong tool selection.
- Schema strictness: structured JSON schemas reduce malformed tool calls vs. free-text descriptions.
- Error handling: tool failures should return structured error objects, not raw exceptions. The agent needs to know if it should retry, choose a different tool, or surface the failure.
Memory architecture
For complex agents, mention the 6 memory types: in-context (what's in the current prompt), episodic (past interaction summaries), semantic (facts about the user/domain), procedural (how to perform tasks), working memory (scratch space during a task), and external (databases, vector stores). Most production agents use in-context + external. Episodic and semantic matter for personalized agents.
Multi-agent patterns
If the question involves complex workflows, mention orchestration patterns: supervisor (one agent delegates to specialized subagents), pipeline (agents in sequence, each transforms the output), and mesh (agents communicate peer-to-peer for collaborative tasks). Each has different failure semantics — a supervisor failure is catastrophic, a pipeline failure is recoverable if intermediate state is persisted.
Failure modes — what separates senior answers
- Infinite loops: agent takes action → tool returns error → agent retries → loop. Fix with a maximum step budget.
- Context overflow: long agent traces fill the context window. Fix with rolling summarization of old turns.
- Tool hallucination: model calls a tool that doesn't exist or with wrong parameter names. Fix with strict schema validation and rejection sampling.
- Reward hacking: agent achieves the stated goal by a technically-correct but wrong path. Fix with explicit success criteria and human review for high-stakes actions.
Interactive lab:
- ReAct: Synergizing Reasoning and Acting in LLMs (Yao et al., 2022)
- ToolFormer: Language Models Can Teach Themselves to Use Tools (Schick et al., 2023)
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →