Context Tetris: Why What You Put in the Prompt Matters as Much as the Model
How to think about context window real estate — system prompt, examples, retrieved chunks, history, and query. Optimising the slot machine that is your prompt.
Your context window is real estate. Like real estate, it's finite, expensive, and the value of what you put in it varies enormously. Context tetris is the art of fitting the right information into the right amount of space, in the right order, to get the best possible model output.
This isn't just an optimisation problem. It's the thing that separates prompts that consistently work in production from prompts that work on your laptop and break on real data.
The attention gradient
Models don't attend to all context equally. Research consistently shows a U-shaped attention pattern: highest attention at the beginning (the system prompt) and the end (the most recent message) of the context. The middle is where information goes to die. This is the lost-in-the-middle problem — and it means *where* you place information matters as much as *whether* you include it.
If a piece of information is critical for the model's answer, put it at the start or at the end of your context — never buried in the middle. This applies to: key facts from retrieved documents, hard constraints, important instructions, and the question itself.
The order principle
| What to put | Where | Why |
|---|---|---|
| Task instructions | Start of system prompt | Highest attention; sets the frame for everything that follows |
| Hard constraints | Start and end | Reinforce critical constraints at both attention peaks |
| Retrieved context | Before the question | Model reads context then formulates the answer — not the reverse |
| The user's question | Very end | Most recent, highest attention — the model is most focused here |
| Less critical history | Middle | Accepted sacrifice zone — useful but not critical |
| Examples (few-shot) | After instructions | Model benefits from seeing examples close to the task description |
Token compression techniques
Progressive summarisation
For long documents in context: don't include the full text. Summarise at the appropriate level of detail for the query. A query about a document's conclusion needs the conclusion and supporting evidence — not the full methodology section.
Structured over prose
When you control the format of information going into context, prefer structured formats. A table uses fewer tokens than an equivalent paragraph of prose. JSON is dense. Markdown bullets are efficient. Raw text paragraphs are the least token-efficient way to communicate structured information.
Negative space
What you leave out matters as much as what you include. For every piece of context: does the model actually need this to answer correctly? If removing it doesn't change the answer on your eval set, it shouldn't be in the context. Build a habit of ablation: systematically remove context components and check whether quality drops.
The metadata question
Retrieved chunks often come with metadata: source document title, date, author, section heading. This metadata consumes tokens but can dramatically improve answer quality — the model understands what it's reading. The right amount: enough to give context, not so much that it crowds out content. One sentence of context per chunk is usually enough.
Context arrangement experiments →: Test how context order affects model outputs in the Playground module.
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →