AI Engineering 9 min read

Schema Drift: When Your Structured Output Parser Silently Breaks

How adding one optional field to a function-call schema breaks 12% of responses in ways your monitoring misses. The versioning discipline, canary rollout, and output validation layer that prevent it.

The function-calling schema had worked fine for four months. Then the product team added an optional `urgency_level` field to the ticket creation schema. They tested it: the model produced valid JSON with the new field. They shipped. Two days later, 12% of ticket creation calls were returning a JSON parse error. The model was occasionally generating urgency_level: "high" with a trailing comma before the closing brace — a valid JSON syntax error that their parser rejected.

This is schema drift failure: a change to a structured output schema that causes the model to produce invalid output in a fraction of cases. The failure rate is low enough to pass manual QA, high enough to show up as a meaningful error rate in production.

Why schema changes break structured outputs

LLMs that generate structured output (JSON, YAML, XML) are doing constrained text generation. They've learned patterns: which field names tend to appear together, what valid JSON syntax looks like, how to handle optional vs required fields. When you change the schema, you're changing the pattern the model needs to match.

Specific failure patterns from schema changes:

Adding an optional field: the model generates it sometimes, doesn't other times, but occasionally generates it in the wrong position (e.g. after the closing bracket of a nested object)
Changing a field type: the model occasionally generates the old type alongside the new one (a string that looks like a number, or a number wrapped in quotes)
Adding an enum field: the model generates values not in your enum 40% of the time, especially when the enum values are novel domain terms it hasn't seen in training
Nesting a previously flat field: the model generates the old flat structure alongside the new nested one, producing a merge of both structures

Output validation as the first line of defense

Your LLM API call should always be wrapped in a retry loop with schema validation:

from pydantic import BaseModel, ValidationError
import json

MAX_RETRIES = 3

async def call_with_validation(prompt: str, schema: BaseModel, retries=MAX_RETRIES):
    for attempt in range(retries):
        raw = await llm_client.complete(prompt)
        try:
            parsed = json.loads(raw)
            validated = schema.model_validate(parsed)
            return validated
        except (json.JSONDecodeError, ValidationError) as e:
            if attempt == retries - 1:
                raise
            # Log the failure and retry with a corrective prompt
            prompt += f"\n\nPrevious attempt produced invalid JSON: {e}. Please try again."

Three retries catch ~99% of transient schema failures. Log every retry with the raw model output — this gives you the data to diagnose systematic failures.

Schema versioning discipline

Treat LLM output schemas like API contracts:

Version your schemas: TicketSchema_v2 is a different schema from TicketSchema_v1, not a modification of it
Canary rollouts for schema changes: route 5% of traffic to the new schema, monitor error rate for 24 hours, then roll forward
Backward-compatible changes only: adding optional fields is lower risk than modifying existing fields; both are lower risk than removing fields or changing types
Eval against schema changes: before shipping a schema change, run 100+ samples through the new schema and manually inspect the failure cases

Every structured output call should have a Pydantic (or equivalent) validation layer between the raw model output and your application logic. The model's output is untrusted until validated — treat it like user input.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →