GenAI Systems Lab Open interactive version →
Agents & Tool Use 8 min read

Tracing Agent Loops: How to Debug Step-by-Step Execution

What a step trace reveals, how to spot loops, wrong tool calls, and hallucinated observations — and how to use the Agent Loop Simulator to reproduce failures.

An agent produced a wrong answer. You need to find out why. The agent took 14 steps, called 6 different tools, and made 4 LLM calls. Where did it go wrong? Without tracing, this is archaeology. With tracing, it's a 5-minute investigation.

What a trace needs to capture

OpenTelemetry for agents

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = trace.get_tracer("agent")

async def agent_step(step_num, messages, available_tools):
    with tracer.start_as_current_span(f"agent_step_{step_num}") as span:
        span.set_attribute("step_number", step_num)
        span.set_attribute("message_count", len(messages))

        with tracer.start_as_current_span("llm_call") as llm_span:
            response = await call_llm(messages, available_tools)
            llm_span.set_attribute("model", response.model)
            llm_span.set_attribute("input_tokens", response.usage.input_tokens)
            llm_span.set_attribute("output_tokens", response.usage.output_tokens)

        if response.tool_use:
            with tracer.start_as_current_span("tool_call") as tool_span:
                tool_span.set_attribute("tool_name", response.tool_use.name)
                tool_span.set_attribute("tool_input", str(response.tool_use.input))
                try:
                    result = await execute_tool(response.tool_use)
                    tool_span.set_attribute("tool_result_length", len(str(result)))
                except Exception as e:
                    tool_span.set_status(Status(StatusCode.ERROR, str(e)))
                    raise

        return response

LangSmith for higher-level tracing

For teams using LangChain, LangSmith provides automatic tracing with a visual UI. Every chain, agent step, LLM call, and tool invocation is captured in a tree view. You can replay any trace, compare traces across runs, and annotate specific steps with feedback.

For teams not using LangChain, Langfuse and Arize Phoenix offer similar capabilities with a simpler SDK. Both support the OpenTelemetry standard, so you're not locked into a specific provider.

Debugging checklist for a failed agent run

Trace agent loops →: Step through agent execution traces in the Agents module.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →