Production & LLMOps 12 min read

Async Task Queues for Long-Running Agent Jobs: Celery, SQS, and Exactly-Once Execution

Why synchronous HTTP fails for 60-second tasks. Task state machine design. Exactly-once execution via idempotency keys. Fan-out with Celery chord. Dead letter queues. Result TTL.

Prerequisites: async programming basics, Python task concepts. After this post you will know how to design task queue architecture for long-running agent jobs: state machine design, exactly-once execution, fan-out patterns, dead letter queues, and result lifecycle management.

An agent that takes 60 seconds to complete cannot run synchronously behind an HTTP endpoint. API gateways time out. Load balancers drop idle connections. Users hit refresh and trigger duplicates. The solution: an async task queue where the HTTP layer accepts the job and returns immediately, and background workers execute the agent task.

The hard constraint that makes agent task queues different from standard ones: agent tasks have side effects. Before it fails halfway through, the agent may have already sent an email, updated a CRM record, or charged a payment method. Task queue design for agents is fundamentally about handling partial execution safely.

Task State Machine

Model every agent task as an explicit state machine. The states determine what is safe to do when things go wrong.

PENDING: task queued, not started. Safe to discard if system is overloaded. RUNNING: worker claimed the task and started executing. Do not re-queue. SUCCESS: completed. Result available in result backend. FAILURE: failed after all retries. Routes to dead letter queue. RETRY: transient failure. Will be re-queued after backoff delay. REVOKED: cancelled. Workers that pick up a revoked task must discard it immediately.

The missing state causing most production bugs: worker crashes mid-task. The task is RUNNING in the database but the worker is dead. After a visibility timeout expires, the message becomes visible again and a new worker picks it up — re-executing from the start. Your agent must be safe to re-run with the original idempotency key.

Exactly-Once Execution via Idempotency Keys

Exactly-once delivery is impossible in distributed systems. Exactly-once execution is achievable with idempotency keys in the task payload.

from celery import Celery
from redis import Redis

app = Celery('tasks', broker='redis://localhost:6379/0', backend='redis://localhost:6379/1')
redis_client = Redis()

@app.task(bind=True, max_retries=3, acks_late=True)
def run_agent_task(self, payload: dict):
    ikey = payload['idempotency_key']  # Generated before first HTTP request

    # Already completed? Return stored result.
    if redis_client.get(f'agent-done:{ikey}'):
        return {'status': 'already_completed'}

    # Distributed lock: one execution per idempotency key at a time
    lock = redis_client.set(f'agent-lock:{ikey}', '1', nx=True, ex=300)
    if not lock:
        raise self.retry(countdown=5)  # Another worker has this

    try:
        result = execute_agent(payload)
        redis_client.set(f'agent-done:{ikey}', '1', ex=86400)  # 24h TTL
        return result
    except Exception as exc:
        redis_client.delete(f'agent-lock:{ikey}')
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

Fan-Out: Parallel Agent Sub-Tasks

Complex agent workflows require parallel execution — a research agent spawning 5 parallel web searches, a document agent running extraction on 20 sections simultaneously. Celery's chord primitive handles this.

group: a set of tasks that execute in parallel. chord: a group followed by a callback that runs only after all parallel tasks complete. The natural pattern for fan-out + aggregate. Fan-out budget: cap concurrency. A single agent spawning 100 sub-tasks will starve all other tasks in the queue. Set a hard maximum per task type.

from celery import group, chord

@app.task
def search_web(query: str) -> dict:
    return run_web_search(query)

@app.task
def aggregate_results(results: list, task_id: str) -> dict:
    return synthesize(results, task_id)

def run_parallel_research(queries: list, task_id: str):
    # Cap at 10 parallel searches
    search_group = group(search_web.s(q) for q in queries[:10])
    pipeline = chord(search_group)(aggregate_results.s(task_id))
    return pipeline

Dead Letter Queues

A task that fails all retries must not be silently discarded. It moves to a dead letter queue (DLQ) where an engineer can inspect it, understand why it failed, and decide whether to replay after the fix.

Every production agent queue needs a DLQ. Configure automatic DLQ routing after max retries (SQS: maxReceiveCount + redrive policy; Celery: task_reject_on_worker_lost). DLQ monitoring: alert when DLQ depth exceeds threshold. A growing DLQ is a systemic failure signal. DLQ contents: log full task payload, error traceback, attempt count, and timestamps per attempt. DLQ replay: replay after fixing the bug. Never replay automatically — manual review of each failed task type is required before re-executing agents with side effects.

Result Backend and TTL

Redis as result backend: fast, widely used. Set a TTL on every result (24–48 hours typical). Without TTL, task results accumulate forever. Separate result backend from broker: do not poll the task queue for results. Result retrieval should hit the result backend (Redis) directly, not add load to the broker. Large results: for large agent outputs, store in S3/blob storage and put only the reference in the result backend.

The key insight: the task queue is the reliability contract between your API and your agent. Every guarantee you make to users — 'your task will complete', 'no duplicate actions', 'failures are recoverable' — is enforced here. A queue with no DLQ, no idempotency, and no visibility into partial completion is not a reliability layer. It is a hope layer.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →