RAG & Retrieval 8 min read

Missing Context: When RAG Retrieves the Right Chunk but Answers the Wrong Question

Why high similarity score doesn't mean high relevance. The missing context failure mode, why it's hard to detect, and how to fix it.

You've built a RAG system. It retrieves the right chunk — the one that contains the answer — and the model still gets it wrong. You stare at the trace, confused. The context is there. The model saw it. It just... didn't use it properly.

This is the missing context failure mode: not missing retrieval, but missing the surrounding context that makes the retrieved text mean what it means. It's one of the most demoralising bugs in RAG, because everything looks correct until you read carefully.

Why this happens

Chunking splits documents into retrievable pieces. But documents are not written to be read in chunks — they're written to be read sequentially. A chunk that says "this approach reduced latency by 40%" only makes sense if you know what *approach* was being described in the previous paragraph.

When that previous paragraph is in a different chunk — one that didn't score high enough to be retrieved — the model fills in the gap with a plausible answer from its training data. It doesn't flag uncertainty. It answers confidently based on half the information.

The five patterns

1. Pronoun resolution failure

The retrieved chunk says "it reduces error rates by 30%." The antecedent of "it" — the technique being described — is in an earlier, unretrieved chunk. The model guesses what "it" refers to, usually incorrectly.

2. Dependency on document structure

Tables and lists are the worst offenders. A table row like "\"Q3 2024\" | \"$4.2M\" | \"↑ 18%\"" is meaningless without the table headers. If headers and data rows split across chunks, every data row is uninterpretable.

Never split tables across chunks. Use a document parser that identifies table boundaries and keeps the header row with each data segment. This single rule fixes a huge category of structured-document RAG failures.

3. Definition/reference split

The first chunk in a document defines a term ("ARPU means Average Revenue Per User"). A later chunk uses that term without redefining it. If only the later chunk is retrieved, the model may misinterpret the term or use a different meaning from its training data.

4. Conditional context

"If the user is on the enterprise plan, the limit is 10,000 requests per day." Retrieved alone, this seems useful. But if the document also says "If the user is on the starter plan, the limit is 100 requests per day" in a different chunk, and both are retrieved for the query "what are my limits?", the model may hallucinate a synthesised non-existent limit.

5. Implicit negation

Section 2 of a document describes a feature. Section 5 says that feature was deprecated in version 3.0. If only Section 2 is retrieved, the model confidently describes a feature that no longer exists.

Fixes

Fix	What it solves
Sentence-window retrieval	Retrieve the target chunk + 1–2 sentences before/after. Cheap, effective for pronoun/reference issues.
Parent-document retrieval	Index small chunks; return the full parent section on match. Maintains table/list integrity.
Contextual chunk headers	Prepend a generated context sentence to each chunk before embedding: 'This chunk is from Section 3 of the 2024 Q3 report, discussing APAC revenue...'
Metadata filtering	Add version/date metadata to chunks. Filter retrieved results to the correct document version.
Multi-chunk synthesis	Retrieve top-10, not top-3. Use the model to synthesise across more context before answering.

How to catch this in evals

Build eval examples specifically for this pattern: questions whose answers require understanding context from *outside* the retrieved chunk. Flag cases where the model's answer is plausible but wrong — this is the signature of missing context, not hallucination from thin air.

Debug retrieval failures →: Step through RAG traces and identify chunk boundary issues in the lab.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →