GenAI Systems Lab Open interactive version →
Production & LLMOps 12 min read

Async Task Queues for Long-Running Agent Jobs: Celery, SQS, and Exactly-Once Execution

Why synchronous HTTP fails for 60-second tasks. Task state machine design. Exactly-once execution via idempotency keys. Fan-out with Celery chord. Dead letter queues. Result TTL.

Prerequisites: async programming basics, Python task concepts. After this post you will know how to design task queue architecture for long-running agent jobs: state machine design, exactly-once execution, fan-out patterns, dead letter queues, and result lifecycle management.

An agent that takes 60 seconds to complete cannot run synchronously behind an HTTP endpoint. API gateways time out. Load balancers drop idle connections. Users hit refresh and trigger duplicates. The solution: an async task queue where the HTTP layer accepts the job and returns immediately, and background workers execute the agent task.

The hard constraint that makes agent task queues different from standard ones: agent tasks have side effects. Before it fails halfway through, the agent may have already sent an email, updated a CRM record, or charged a payment method. Task queue design for agents is fundamentally about handling partial execution safely.

Task State Machine

Model every agent task as an explicit state machine. The states determine what is safe to do when things go wrong.

The missing state causing most production bugs: worker crashes mid-task. The task is RUNNING in the database but the worker is dead. After a visibility timeout expires, the message becomes visible again and a new worker picks it up — re-executing from the start. Your agent must be safe to re-run with the original idempotency key.

Exactly-Once Execution via Idempotency Keys

Exactly-once delivery is impossible in distributed systems. Exactly-once execution is achievable with idempotency keys in the task payload.

from celery import Celery
from redis import Redis

app = Celery('tasks', broker='redis://localhost:6379/0', backend='redis://localhost:6379/1')
redis_client = Redis()

@app.task(bind=True, max_retries=3, acks_late=True)
def run_agent_task(self, payload: dict):
    ikey = payload['idempotency_key']  # Generated before first HTTP request

    # Already completed? Return stored result.
    if redis_client.get(f'agent-done:{ikey}'):
        return {'status': 'already_completed'}

    # Distributed lock: one execution per idempotency key at a time
    lock = redis_client.set(f'agent-lock:{ikey}', '1', nx=True, ex=300)
    if not lock:
        raise self.retry(countdown=5)  # Another worker has this

    try:
        result = execute_agent(payload)
        redis_client.set(f'agent-done:{ikey}', '1', ex=86400)  # 24h TTL
        return result
    except Exception as exc:
        redis_client.delete(f'agent-lock:{ikey}')
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

Fan-Out: Parallel Agent Sub-Tasks

Complex agent workflows require parallel execution — a research agent spawning 5 parallel web searches, a document agent running extraction on 20 sections simultaneously. Celery's chord primitive handles this.

from celery import group, chord

@app.task
def search_web(query: str) -> dict:
    return run_web_search(query)

@app.task
def aggregate_results(results: list, task_id: str) -> dict:
    return synthesize(results, task_id)

def run_parallel_research(queries: list, task_id: str):
    # Cap at 10 parallel searches
    search_group = group(search_web.s(q) for q in queries[:10])
    pipeline = chord(search_group)(aggregate_results.s(task_id))
    return pipeline

Dead Letter Queues

A task that fails all retries must not be silently discarded. It moves to a dead letter queue (DLQ) where an engineer can inspect it, understand why it failed, and decide whether to replay after the fix.

Result Backend and TTL

The key insight: the task queue is the reliability contract between your API and your agent. Every guarantee you make to users — 'your task will complete', 'no duplicate actions', 'failures are recoverable' — is enforced here. A queue with no DLQ, no idempotency, and no visibility into partial completion is not a reliability layer. It is a hope layer.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →