GenAI Systems Lab Open interactive version →
Production & LLMOps 11 min read

How Notion Built AI on a Block-Based Data Model: Architecture Decisions and Lessons

Notion's AI features aren't standard RAG. Their block-based data model created unique chunking challenges, context assembly problems, and retrieval constraints that shaped every architecture decision.

Notion's AI features look like standard RAG from the outside: you ask a question, the system finds relevant content from your workspace, and returns an answer. Under the hood, it's more complicated — and more instructive — than that.

The complication comes from Notion's data model. Every piece of content in Notion is a block: a paragraph block, a heading block, a to-do block, a table block, a database row. Blocks are nested. A page is a block. A sub-page is a block inside a page block. A table row is a block inside a table block inside a page block.

The core challenge: standard chunking algorithms assume flat, linear text. Notion's data is a tree. A chunk boundary that makes sense for flat text will often split a row from its table header, a list item from its parent context, or a database cell from its column label.

The block-as-chunk approach

The most natural solution — and what Notion appears to use — is treating blocks as the atomic unit of retrieval. Each block gets embedded individually. The retriever finds the most relevant blocks for a query.

This works well for paragraph blocks. It breaks down for small blocks: a single-sentence to-do item has almost no semantic content. Embedded alone, it's close to meaningless. A to-do that says 'Fix the auth bug' will match almost nothing useful.

The solution to short blocks is context enrichment: when embedding a block, prepend its parent chain. A to-do inside 'Q2 Engineering Sprint' inside 'Engineering' gets serialized as 'Engineering > Q2 Engineering Sprint: Fix the auth bug'. Now the embedding carries context.

Context assembly for generation

Retrieval gives you a set of relevant blocks. The harder problem is assembling them into a coherent context for the LLM.

Isolated blocks are often incoherent. A paragraph from a meeting notes page makes sense if you know it's from a meeting about Q3 roadmap. Without that frame, the LLM has to guess. Notion likely assembles context by including: the retrieved block, its parent page title, its position in the page hierarchy, and a few surrounding sibling blocks.

This is a general lesson: retrieval returns the most relevant chunks, but the LLM context should include surrounding context. The retrieved chunk is a pointer to a region of the document, not the complete context itself.

For database-structured content, context assembly looks different. A database of project tasks needs to be serialized in a way the LLM can reason about: 'Project: Redesign checkout | Status: In progress | Owner: Sarah | Due: June 15'. This is query-specific serialization — you decide what fields to include based on the user's question.

The delta ingestion problem

Notion workspaces change constantly. Pages are edited, created, deleted. A standard approach would be to re-embed the entire workspace on a schedule — but workspaces can have hundreds of thousands of blocks.

The production-grade solution is delta ingestion: track which blocks changed since the last run using Notion's block update timestamps and webhook events. Re-embed only those blocks. Delete embeddings for deleted blocks. This keeps the index fresh at a fraction of the cost.

Access control as a retrieval constraint

Notion has granular access control: pages can be shared with specific people, teams, or the entire workspace. The AI system must respect these permissions at query time — you can't return content from a page the user doesn't have access to.

This is implemented as metadata filtering on the vector index. Every embedded block carries its page_id and associated permissions. At query time, the retriever adds a filter: only return blocks from pages this user can access. This means the user's permission set is part of the retrieval query, not just a post-retrieval filter.

Filtering after retrieval (post-hoc filtering) is a common mistake. If you retrieve top-k=20 blocks and then filter 15 out due to permissions, you've burned latency and returned poor context. Always filter at the retrieval layer — before re-ranking.

What Notion's AI features reveal about RAG at scale

Three hard lessons from Notion's architecture that apply broadly:

Interactive lab:

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →