Hybrid Search: Combining BM25 and Vector Retrieval
Why pure semantic search misses exact matches, and pure keyword search misses meaning. How hybrid search with RRF fusion beats both.
Pure semantic search misses exact matches. If a user asks "what is the CVE-2024-1234 vulnerability?", a dense vector retriever will find vaguely security-related chunks, not the one that contains that exact CVE ID. Pure keyword search misses meaning — "car" and "automobile" are unrelated to BM25.
Hybrid search combines both. Run dense retrieval and sparse (keyword) retrieval in parallel, then fuse the results. The combination consistently outperforms either approach alone.
Dense vs. sparse retrieval
| Property | Dense (vector) | Sparse (BM25/TF-IDF) |
|---|---|---|
| Best for | Semantic similarity, paraphrases | Exact matches, rare terms, IDs |
| Misses | Rare words, IDs, code, model names | Paraphrases, synonyms, meaning |
| Speed | Fast with ANN index | Very fast — inverted index |
| Index size | Large (float32 vectors) | Compact (sparse integers) |
| Training needed | Yes — embedding model | No — pure statistics |
Reciprocal Rank Fusion (RRF)
RRF is the standard fusion algorithm. For each candidate document, its score is the sum of 1/(k + rank) across all retrievers, where k is a smoothing constant (typically 60). This is rank-based, not score-based — it doesn't require normalising the outputs of different retrievers.
def rrf_fusion(dense_results, sparse_results, k=60):
"""
dense_results, sparse_results: lists of (doc_id, score) sorted by score desc
Returns merged list sorted by RRF score desc
"""
scores = {}
for rank, (doc_id, _) in enumerate(dense_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
for rank, (doc_id, _) in enumerate(sparse_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
When hybrid search pays off most
- Technical documentation: contains model names, error codes, function signatures — exact match is critical
- Legal / medical: specific terminology, case numbers, drug names must match precisely
- Multi-language corpora: semantic search underperforms on rare languages; BM25 is language-agnostic
- Product catalogues: SKUs, barcodes, exact product names need keyword matching
In Weaviate and Qdrant, hybrid search is built-in. In pgvector, combine with Postgres full-text search (tsvector). In Pinecone, their sparse-dense index supports hybrid natively. The routing logic is trivial — the infrastructure is already there.
Toggle hybrid search in RAG Lab →: Compare dense-only vs. hybrid retrieval on queries that require exact matching.
- Reciprocal Rank Fusion (RRF) outperforms Condorcet and individual rank learning methods (Cormack et al., 2009)
- SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking (Formal et al., 2021)
- BM25 and Beyond: Okapi BM25 Explanation — Elastic
Try it interactively
GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.
Open GenAI Systems Lab →