GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

Schema Drift: When Your Structured Output Parser Silently Breaks

How adding one optional field to a function-call schema breaks 12% of responses in ways your monitoring misses. The versioning discipline, canary rollout, and output validation layer that prevent it.

The function-calling schema had worked fine for four months. Then the product team added an optional `urgency_level` field to the ticket creation schema. They tested it: the model produced valid JSON with the new field. They shipped. Two days later, 12% of ticket creation calls were returning a JSON parse error. The model was occasionally generating urgency_level: "high" with a trailing comma before the closing brace — a valid JSON syntax error that their parser rejected.

This is schema drift failure: a change to a structured output schema that causes the model to produce invalid output in a fraction of cases. The failure rate is low enough to pass manual QA, high enough to show up as a meaningful error rate in production.

Why schema changes break structured outputs

LLMs that generate structured output (JSON, YAML, XML) are doing constrained text generation. They've learned patterns: which field names tend to appear together, what valid JSON syntax looks like, how to handle optional vs required fields. When you change the schema, you're changing the pattern the model needs to match.

Specific failure patterns from schema changes:

Output validation as the first line of defense

Your LLM API call should always be wrapped in a retry loop with schema validation:

from pydantic import BaseModel, ValidationError
import json

MAX_RETRIES = 3

async def call_with_validation(prompt: str, schema: BaseModel, retries=MAX_RETRIES):
    for attempt in range(retries):
        raw = await llm_client.complete(prompt)
        try:
            parsed = json.loads(raw)
            validated = schema.model_validate(parsed)
            return validated
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == retries - 1:
                raise
            # Log the failure and retry with a corrective prompt
            prompt += f"\n\nPrevious attempt produced invalid JSON: {e}. Please try again."
    

Three retries catch ~99% of transient schema failures. Log every retry with the raw model output — this gives you the data to diagnose systematic failures.

Schema versioning discipline

Treat LLM output schemas like API contracts:

Every structured output call should have a Pydantic (or equivalent) validation layer between the raw model output and your application logic. The model's output is untrusted until validated — treat it like user input.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →