GenAI Systems Lab Open interactive version →
Agents & Tool Use 11 min read

Tool Use in Production Agents: Idempotency, Side Effects, and Audit Trails

How to classify tool risk (read vs write vs destructive), design idempotency correctly, set retry strategy per tool type, build audit logs for compliance, and prevent an agent from sending the email twice.

Prerequisites: agent architecture basics, function calling basics. After this post you will be able to design production-safe tool use patterns: classify tool risk, enforce idempotency, handle failures without data corruption, and build audit trails.

Tool use is where agents stop being text generators and start affecting the real world. Sending emails, updating records, triggering workflows, calling payment APIs — every tool call that has a side effect is a place where a bug doesn't just produce a wrong answer. It produces a wrong action.

Most tutorials show you how to make tool calls work. This post is about making them safe at production scale.

Read vs Write: The Most Important Classification

The first thing you do with any tool is classify it by risk:

Interview trap: 'Just retry failed tool calls.' This is the most common wrong answer. Retrying a read is safe. Retrying a write that already succeeded sends the email twice, charges the card twice, updates the record twice. Idempotency design is the answer, not naive retries.

Idempotency Design

Idempotency means calling a tool multiple times with the same arguments produces the same result as calling it once. For write tools, you design this in — it doesn't happen automatically.

import uuid

class AgentTask:
    def __init__(self):
        self.tool_keys = {}  # tool_name -> idempotency_key
    
    def get_or_create_key(self, tool_name: str) -> str:
        # Key created BEFORE the call, persisted across retries
        if tool_name not in self.tool_keys:
            self.tool_keys[tool_name] = str(uuid.uuid4())
        return self.tool_keys[tool_name]
    
    def send_email(self, to: str, body: str):
        key = self.get_or_create_key('send_email')
        return email_api.send(to=to, body=body, idempotency_key=key)

Tool Schema Design

The LLM decides which tool to call and what arguments to pass based on the schema. A bad schema produces wrong calls:

Timeout and Retry Strategy Per Tool Type

Audit Logging

In enterprise environments, tool calls must be auditable. This is not optional for compliance.

Production reality: the first time an agent sends an email to the wrong customer, or triggers a payment twice, you will need the audit log to understand exactly what the LLM decided, why it decided it, and what context it had. Build the audit trail before you need it.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →