RAG & Retrieval 8 min read

Hybrid Search: Combining BM25 and Vector Retrieval

Why pure semantic search misses exact matches, and pure keyword search misses meaning. How hybrid search with RRF fusion beats both.

Pure semantic search misses exact matches. If a user asks "what is the CVE-2024-1234 vulnerability?", a dense vector retriever will find vaguely security-related chunks, not the one that contains that exact CVE ID. Pure keyword search misses meaning — "car" and "automobile" are unrelated to BM25.

Hybrid search combines both. Run dense retrieval and sparse (keyword) retrieval in parallel, then fuse the results. The combination consistently outperforms either approach alone.

Dense vs. sparse retrieval

Property	Dense (vector)	Sparse (BM25/TF-IDF)
Best for	Semantic similarity, paraphrases	Exact matches, rare terms, IDs
Misses	Rare words, IDs, code, model names	Paraphrases, synonyms, meaning
Speed	Fast with ANN index	Very fast — inverted index
Index size	Large (float32 vectors)	Compact (sparse integers)
Training needed	Yes — embedding model	No — pure statistics

Reciprocal Rank Fusion (RRF)

RRF is the standard fusion algorithm. For each candidate document, its score is the sum of 1/(k + rank) across all retrievers, where k is a smoothing constant (typically 60). This is rank-based, not score-based — it doesn't require normalising the outputs of different retrievers.

def rrf_fusion(dense_results, sparse_results, k=60):
    """
    dense_results, sparse_results: lists of (doc_id, score) sorted by score desc
    Returns merged list sorted by RRF score desc
    """
    scores = {}
    for rank, (doc_id, _) in enumerate(dense_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    for rank, (doc_id, _) in enumerate(sparse_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

When hybrid search pays off most

Technical documentation: contains model names, error codes, function signatures — exact match is critical
Legal / medical: specific terminology, case numbers, drug names must match precisely
Multi-language corpora: semantic search underperforms on rare languages; BM25 is language-agnostic
Product catalogues: SKUs, barcodes, exact product names need keyword matching

In Weaviate and Qdrant, hybrid search is built-in. In pgvector, combine with Postgres full-text search (tsvector). In Pinecone, their sparse-dense index supports hybrid natively. The routing logic is trivial — the infrastructure is already there.

Toggle hybrid search in RAG Lab →: Compare dense-only vs. hybrid retrieval on queries that require exact matching.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →