GenAI Systems Lab Open interactive version →
AI Engineering 8 min read

Structured Outputs: JSON Mode, Tool Calling, and Constrained Decoding

How to reliably get JSON, tables, and typed data from LLMs. JSON mode vs tool calling vs grammar-constrained decoding — what each guarantees and where each breaks.

Getting an LLM to return valid JSON sounds trivial until you've spent three hours debugging why your production pipeline intermittently returns a response that starts with 'Sure! Here's the JSON you asked for:' followed by a code block with a trailing comma. Structured outputs are a solved problem — but only if you pick the right tool.

This post maps the four main approaches: prompting, JSON mode, tool/function calling, and grammar-constrained decoding. Each gives different guarantees, and those differences matter at production scale.

Why LLMs Drift from Format Instructions

LLMs are trained to predict the most probable next token, not to follow schema rules. When you write 'respond with JSON', the model has seen billions of examples of humans responding to similar instructions with helpful prose explanations, code blocks, or hybrid formats. The instruction competes with those learned distributions.

Failure rate on 'just add JSON instructions' prompting is typically 5–15% in production, depending on model and schema complexity. At 10,000 requests/day that's 500–1,500 parse errors. You need a structural guarantee, not a polite request.

Approach 1: Prompting (No Guarantees)

Prompting-only means asking the model to return JSON in the system or user prompt, with or without a schema. This requires no special API features but gives zero structural guarantees.

system = """You are a data extraction assistant. Always respond with valid JSON.
Schema: {"name": string, "age": integer, "email": string}
Never include any text before or after the JSON object."""

# Works most of the time. Fails 5-15% of the time in production.
# Failures: preamble text, invalid JSON, schema violations

Use prompting only for low-stakes, low-volume scenarios where a parse error is acceptable and retry is cheap.

Approach 2: JSON Mode

JSON mode (OpenAI, Anthropic) guarantees syntactically valid JSON output. The model is constrained at the decoding level to only produce tokens that keep the JSON syntax valid. It does NOT guarantee your schema — the model can return valid JSON that doesn't match your expected structure.

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract entity data as JSON with fields: name, age, email"},
        {"role": "user", "content": user_input}
    ]
)

data = json.loads(response.choices[0].message.content)
# Guaranteed: valid JSON syntax
# NOT guaranteed: correct fields, correct types, no hallucinated fields

JSON mode prevents syntax errors but not schema errors. You still need Pydantic or Zod validation after parsing. The guarantee is: json.loads() will not throw. It says nothing about data shape.

Approach 3: Tool Calling / Function Calling

Tool calling (OpenAI function calling, Anthropic tool use) lets you define a JSON Schema for the expected output. The model is constrained to call the tool with arguments that match the schema. This gives you both syntax and schema guarantees for the fields you define.

tools = [{
    "type": "function",
    "function": {
        "name": "extract_entity",
        "description": "Extract entity information from text",
        "parameters": {
            "type": "object",
            "properties": {
                "name":  {"type": "string", "description": "Full name"},
                "age":   {"type": "integer", "description": "Age in years"},
                "email": {"type": "string", "format": "email"}
            },
            "required": ["name", "age", "email"],
            "additionalProperties": false
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_entity"}},
    messages=[{"role": "user", "content": user_input}]
)

args = json.loads(response.choices[0].message.tool_calls[0].function.arguments)
# Guaranteed: valid JSON + matches schema types + required fields present

OpenAI's 'Structured Outputs' mode (August 2024) extended this further — when you pass strict=True, the model is constrained via a finite automaton to only produce tokens valid under your schema. This is the highest-reliability option in the OpenAI API.

Approach 4: Grammar-Constrained Decoding

For self-hosted models, libraries like Outlines, llama.cpp (GBNF grammars), and Guidance provide grammar-constrained decoding: the token sampling is mathematically constrained to only produce tokens that could lead to a valid output under your grammar or schema. Invalid token probabilities are zeroed out at each step.

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-v0.1")

from pydantic import BaseModel

class Entity(BaseModel):
    name: str
    age: int
    email: str

generator = outlines.generate.json(model, Entity)

# Every output is GUARANTEED to be a valid Entity instance
result = generator("Extract: John Smith, 34, john@example.com")
# result is already a validated Pydantic object, not a string

Outlines/GBNF decoding gives the strongest guarantees but adds ~10-20% latency overhead per token (the masking step). Worth it for complex nested schemas where validation failures are expensive.

Validation Patterns

Even with JSON mode or tool calling, always validate the parsed output:

from pydantic import BaseModel, ValidationError
import json

class Entity(BaseModel):
    name: str
    age: int
    email: str

def extract_entity(text: str, max_retries: int = 3) -> Entity:
    for attempt in range(max_retries):
        response = call_llm_with_json_mode(text)
        try:
            data = json.loads(response)
            return Entity(**data)
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == max_retries - 1:
                raise
            # On retry, add error context to the prompt
            text = f"{text}

Previous attempt failed: {e}. Try again."

When Each Approach Wins

ApproachSyntax guaranteeSchema guaranteeCostBest for
Prompting onlyNoneNoneLowestPrototyping, very simple schemas
JSON modeYesNoLowFlexible schemas, catching syntax errors
Tool calling (strict)YesYesLowProduction on OpenAI/Anthropic APIs
Grammar decoding (Outlines)YesYes (any grammar)Medium (+latency)Self-hosted, complex/recursive schemas

Common Failure Modes

Test in Playground →: Compare JSON mode vs tool calling vs prompting on the same extraction task. Watch the failure modes appear live.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →