AI Engineering 7 min read

Context Tetris: Why What You Put in the Prompt Matters as Much as the Model

How to think about context window real estate — system prompt, examples, retrieved chunks, history, and query. Optimising the slot machine that is your prompt.

Your context window is real estate. Like real estate, it's finite, expensive, and the value of what you put in it varies enormously. Context tetris is the art of fitting the right information into the right amount of space, in the right order, to get the best possible model output.

This isn't just an optimisation problem. It's the thing that separates prompts that consistently work in production from prompts that work on your laptop and break on real data.

The attention gradient

Models don't attend to all context equally. Research consistently shows a U-shaped attention pattern: highest attention at the beginning (the system prompt) and the end (the most recent message) of the context. The middle is where information goes to die. This is the lost-in-the-middle problem — and it means *where* you place information matters as much as *whether* you include it.

If a piece of information is critical for the model's answer, put it at the start or at the end of your context — never buried in the middle. This applies to: key facts from retrieved documents, hard constraints, important instructions, and the question itself.

The order principle

What to put	Where	Why
Task instructions	Start of system prompt	Highest attention; sets the frame for everything that follows
Hard constraints	Start and end	Reinforce critical constraints at both attention peaks
Retrieved context	Before the question	Model reads context then formulates the answer — not the reverse
The user's question	Very end	Most recent, highest attention — the model is most focused here
Less critical history	Middle	Accepted sacrifice zone — useful but not critical
Examples (few-shot)	After instructions	Model benefits from seeing examples close to the task description

Token compression techniques

Progressive summarisation

For long documents in context: don't include the full text. Summarise at the appropriate level of detail for the query. A query about a document's conclusion needs the conclusion and supporting evidence — not the full methodology section.

Structured over prose

When you control the format of information going into context, prefer structured formats. A table uses fewer tokens than an equivalent paragraph of prose. JSON is dense. Markdown bullets are efficient. Raw text paragraphs are the least token-efficient way to communicate structured information.

Negative space

What you leave out matters as much as what you include. For every piece of context: does the model actually need this to answer correctly? If removing it doesn't change the answer on your eval set, it shouldn't be in the context. Build a habit of ablation: systematically remove context components and check whether quality drops.

The metadata question

Retrieved chunks often come with metadata: source document title, date, author, section heading. This metadata consumes tokens but can dramatically improve answer quality — the model understands what it's reading. The right amount: enough to give context, not so much that it crowds out content. One sentence of context per chunk is usually enough.

Context arrangement experiments →: Test how context order affects model outputs in the Playground module.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →