GenAI Systems Lab Open interactive version →
RAG & Retrieval 8 min read

Hybrid Search: Combining BM25 and Vector Retrieval

Why pure semantic search misses exact matches, and pure keyword search misses meaning. How hybrid search with RRF fusion beats both.

Pure semantic search misses exact matches. If a user asks "what is the CVE-2024-1234 vulnerability?", a dense vector retriever will find vaguely security-related chunks, not the one that contains that exact CVE ID. Pure keyword search misses meaning — "car" and "automobile" are unrelated to BM25.

Hybrid search combines both. Run dense retrieval and sparse (keyword) retrieval in parallel, then fuse the results. The combination consistently outperforms either approach alone.

Dense vs. sparse retrieval

PropertyDense (vector)Sparse (BM25/TF-IDF)
Best forSemantic similarity, paraphrasesExact matches, rare terms, IDs
MissesRare words, IDs, code, model namesParaphrases, synonyms, meaning
SpeedFast with ANN indexVery fast — inverted index
Index sizeLarge (float32 vectors)Compact (sparse integers)
Training neededYes — embedding modelNo — pure statistics

Reciprocal Rank Fusion (RRF)

RRF is the standard fusion algorithm. For each candidate document, its score is the sum of 1/(k + rank) across all retrievers, where k is a smoothing constant (typically 60). This is rank-based, not score-based — it doesn't require normalising the outputs of different retrievers.

def rrf_fusion(dense_results, sparse_results, k=60):
    """
    dense_results, sparse_results: lists of (doc_id, score) sorted by score desc
    Returns merged list sorted by RRF score desc
    """
    scores = {}
    for rank, (doc_id, _) in enumerate(dense_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
    for rank, (doc_id, _) in enumerate(sparse_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

When hybrid search pays off most

In Weaviate and Qdrant, hybrid search is built-in. In pgvector, combine with Postgres full-text search (tsvector). In Pinecone, their sparse-dense index supports hybrid natively. The routing logic is trivial — the infrastructure is already there.

Toggle hybrid search in RAG Lab →: Compare dense-only vs. hybrid retrieval on queries that require exact matching.

Try it interactively

GenAI Systems Lab is a free platform for AI engineers — configure real failure modes, break things, and build the judgment that gets you hired.

Open GenAI Systems Lab →