GenAI Systems Lab Open interactive version →
RAG & Retrieval 8 min read

Chunking Strategies for RAG: Fixed, Semantic, and Hierarchical

Why chunk size is one of the most impactful RAG config decisions. Fixed-size vs. sentence vs. semantic chunking, with real retrieval quality differences.

Chunking is how you turn a large document into retrievable pieces. It sounds like a preprocessing detail. It is actually one of the most impactful configuration decisions in any RAG system.

Chunk too small and you lose context — the retrieved passage doesn't contain enough surrounding information for the model to answer. Chunk too large and you dilute relevance — the retrieved passage contains the answer buried in noise.

Fixed-size chunking

Split every document into chunks of N tokens with an overlap of M tokens. Fast, predictable, no dependencies. This is the default in most RAG tutorials and it's good enough to get started.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,      # tokens per chunk
    chunk_overlap=64,    # overlap between chunks
    length_function=len,
)
chunks = splitter.split_text(document)

Fixed chunking splits mid-sentence, mid-table, and mid-code-block. If your documents have structure, this destroys it. A table split into 3 chunks will fail to retrieve correctly every time.

Semantic chunking

Instead of counting tokens, detect natural topic boundaries. Embed consecutive sentences and measure cosine similarity. When similarity drops sharply, you've hit a topic boundary — split there. This produces semantically coherent chunks at the cost of more computation at index time.

Hierarchical (parent-child) chunking

Store two chunk sizes: small child chunks for retrieval, large parent chunks for context. At query time, retrieve the small chunk (high precision), then fetch its parent and send the full parent to the LLM (full context). This is the best of both worlds.

Parent-child chunking consistently outperforms fixed chunking in benchmarks. The retriever sees small, precise chunks. The generator sees full, contextual passages. The split in responsibility is the key insight.

Choosing your chunk size

Document typeRecommended chunk sizeOverlapStrategy
Q&A / FAQ128–256 tokens16Fixed — each Q&A is self-contained
Technical docs512 tokens64Fixed or parent-child
Legal / contracts256–512 tokens64Semantic — preserve clauses
CodeFunction-level0Split on function/class boundaries
Earnings reportsParent-childN/ASection headers as parents

Compare chunk strategies in RAG Lab →: Index the same document with different strategies and see how retrieval precision changes.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →