What is Hybrid Search
Hybrid search runs keyword search (BM25) and semantic search in parallel, then merges the two result lists into one using Reciprocal Rank Fusion. It consistently outperforms either approach alone.
Why not just pick one?
Keyword search and semantic search fail in opposite situations:
- Keyword search finds exact matches but misses synonyms. Searching "login" won't find a document about "authentication" — the words don't overlap. This is the vocabulary mismatch problem.
- Semantic search finds conceptual matches but can miss exact identifiers. Searching
parseJSONmight return results about "data serialization" instead of the exact function.
Hybrid search covers both failure modes. The keyword side catches exact matches. The semantic side catches conceptual matches. Documents found by both systems rank highest.
How does it work?
- Run both searches — The query goes to a BM25 engine and an embedding-based engine simultaneously.
- Get two ranked lists — BM25 returns documents ranked by keyword relevance. Semantic search returns documents ranked by cosine similarity.
- Merge with RRF — RRF converts each result's rank position into a score and sums scores across both lists. Documents appearing in both lists get the highest combined scores.
Why RRF instead of averaging scores?
BM25 scores and cosine similarity scores are on completely different scales — BM25 might produce 12.5 while cosine similarity gives 0.73. You can't average them without normalization. RRF sidesteps this by ignoring raw scores entirely. It only uses rank positions, which makes it simple and robust.
When is hybrid search worth the cost?
Hybrid search requires maintaining two indexes: an inverted index for keywords and a vector index for embeddings. That's more storage, more indexing time, and more query-time computation.
The payoff is consistent. In benchmarks across code search, documentation search, and general text retrieval, hybrid search improves relevance by 10-30% over the best single method. If search quality matters, the extra infrastructure is worth it.