GenAI Systems Lab Open interactive version →
AI Engineering 9 min read

Context Bleed: When One User's Data Poisons Another's Response

The subtle session isolation failure that affects multi-tenant LLM apps. How system prompt fragments, conversation history, and cached context leak across users — and what it costs you.

In February 2024, a large enterprise SaaS company discovered that some users were seeing fragments of other users' system prompts in their responses. The product used a multi-tenant architecture where each customer had a customized system prompt. A caching bug caused one customer's prompt to leak into another's conversation context. Nobody noticed for three weeks because the leaked fragments were syntactically valid — they just happened to mention a competitor's product name.

Context bleed is the class of failures where information that should be scoped to one session, user, or tenant appears in another's context. It's different from a data breach — there's no unauthorized access, just unintended mixing. But the consequences can be just as serious.

Context bleed failures are especially dangerous because they often pass automated testing. Your test suite uses isolated test users, so it never catches cross-user contamination. The only signals are in production logs — which you may not be watching.

The three mechanisms of context bleed

Context bleed happens through three distinct mechanisms, each requiring a different fix.

1. Prompt caching without proper invalidation

Many production LLM systems cache compiled system prompts to avoid reprocessing them on every request. If that cache uses a key that doesn't include the tenant/user identifier — or if the key is computed incorrectly — two different users will get the same cached prompt.

The insidious part: caching libraries often use content hashes as keys. If two users happen to have the same base system prompt with different variable substitutions, and the variable substitution step runs after the cache lookup, the second user gets the first user's fully-substituted prompt.

2. Conversation history in shared message buffers

Streaming LLM APIs write tokens to a buffer. If that buffer isn't fully flushed and reset between requests — or if a worker process handles a new request before a previous one has fully cleaned up — you can get history fragments from a previous conversation appearing in a new one. This is more common with serverless workers that are reused across requests than with fresh-container deployments.

3. RAG retrieval across tenant boundaries

If your vector store doesn't enforce tenant isolation at query time, a metadata filter bug can make a query retrieve documents from a different tenant's namespace. The model then happily incorporates that information into its response. This is particularly common when metadata filters are optional — a bug where the filter is accidentally omitted will silently expose cross-tenant data.

Detection is hard — here's what works

Standard functional testing won't catch context bleed. What does:

The fix hierarchy

Fix context bleed in this order, from most to least impactful:

The rule of thumb: treat every piece of context you inject as potentially visible to any user of your system. Design your isolation architecture against that threat model.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →