
How Hybrid Search Works — Combining Keywords and Meaning
Hybrid search runs BM25 and semantic search in parallel, then merges their results using Reciprocal Rank Fusion. It finds both exact keyword matches and conceptual matches — something neither approach achieves alone.
If you've read the first two lessons in this path, you understand the tradeoff: BM25 is precise with exact terms but blind to meaning. Semantic search understands meaning but can miss the exact identifier you're looking for. Hybrid search eliminates the tradeoff.
The Problem Each Side Misses
Consider the query: "authentication middleware"
BM25 searches the inverted index and finds files containing those exact words:
src/middleware/authentication.rs— theAuthenticationMiddlewarestructdocs/middleware.md— section titled "Authentication Middleware"tests/auth_middleware_test.rs— test file with both words in the name
Semantic search converts the query into a vector using an embedding model and finds files with similar meaning:
src/handlers/login.rs— handles the login flowsrc/guards/session.rs— validates session tokens before requests proceedsrc/middleware/auth_guard.rs— checks authorization headers
BM25 missed the login handler and session guard because they never use the word "authentication." Semantic search missed AuthenticationMiddleware because the embedding model focuses on conceptual similarity, not exact identifiers.
A developer searching for "authentication middleware" probably wants all of these results. Hybrid search finds them.
How It Works — Two Parallel Paths
Hybrid search runs two independent retrieval paths against the same corpus:
Path 1: BM25. The query goes through a tokenizer, gets matched against the inverted index, and produces a ranked list of results scored by term frequency and IDF. This takes under 1 millisecond on a corpus of thousands of documents.
Path 2: Semantic. The query is fed through an embedding model (we use Nomic-embed-text-v1.5 via ONNX Runtime) to produce a vector. That vector is compared against pre-computed vectors for every chunk in the index using cosine similarity. This produces a second ranked list. Query embedding takes ~187ms; the similarity search is fast after that.
Both paths run concurrently. The total latency is the slower of the two, not the sum. Now you have two ranked lists. The question becomes: how do you merge them?
Reciprocal Rank Fusion
Raw scores from BM25 and semantic search are not comparable. A BM25 score of 4.2 and a cosine similarity of 0.83 exist on completely different scales. You can't just add them.
Reciprocal Rank Fusion (RRF) avoids this problem entirely by ignoring scores and using only rank positions (Cormack et al., 2009). The formula:
RRF_score(doc) = Σ 1 / (k + rank_in_list)
The sum is over every ranked list the document appears in. The constant k is typically 60. It dampens the difference between adjacent ranks — rank 1 vs rank 2 is a small difference, not a 2x difference.
Worked Example
Take the query "authentication middleware" and suppose BM25 and semantic search return these top 5 results:
| Rank | BM25 result | Semantic result |
|---|---|---|
| 1 | authentication.rs | login.rs |
| 2 | middleware.md | session.rs |
| 3 | auth_middleware_test.rs | auth_guard.rs |
| 4 | config.rs | authentication.rs |
| 5 | routes.rs | middleware.md |
Now compute RRF scores for each document (k = 60):
authentication.rs — rank 1 in BM25, rank 4 in semantic:
1/(60+1) + 1/(60+4) = 0.01639 + 0.01563 = 0.03202
middleware.md — rank 2 in BM25, rank 5 in semantic:
1/(60+2) + 1/(60+5) = 0.01613 + 0.01538 = 0.03151
login.rs — absent from BM25, rank 1 in semantic:
1/(60+1) = 0.01639
auth_middleware_test.rs — rank 3 in BM25, absent from semantic:
1/(60+3) = 0.01587
session.rs — absent from BM25, rank 2 in semantic:
1/(60+2) = 0.01613
The merged ranking:
| RRF Rank | Document | RRF Score | Why |
|---|---|---|---|
| 1 | authentication.rs | 0.03202 | Appeared in both lists |
| 2 | middleware.md | 0.03151 | Appeared in both lists |
| 3 | login.rs | 0.01639 | Semantic #1, but only one list |
| 4 | session.rs | 0.01613 | Semantic #2, but only one list |
| 5 | auth_middleware_test.rs | 0.01587 | BM25 #3, but only one list |
The pattern: a document ranked modestly in both lists beats a document ranked #1 in only one list. This is why RRF works — agreement between independent systems is a strong signal of relevance.
The Evidence
We tested this on 825 real files split into 7,492 chunks with 20 queries. The full benchmark data is public.
| Approach | Recall@1 | Recall@5 | MRR | Query time |
|---|---|---|---|---|
| BM25 alone (Tantivy) | 12/20 | 18/20 | 0.727 | <1ms |
| Best semantic alone (Nomic) | 13/20 | 18/20 | 0.754 | 187ms |
| Hybrid (BM25 + Nomic + RRF) | — | — | higher than both | ~190ms |
BM25 and Nomic each found 18 of 20 expected results in the top 5 — but they weren't the same 18. BM25 caught exact identifier matches that semantic missed. Semantic caught conceptual matches that BM25 missed. Hybrid found both sets.
The query time cost is minimal. BM25 runs in under a millisecond. The embedding takes ~187ms. Since they run in parallel, total latency is ~190ms — essentially the cost of semantic search alone, but with BM25's precision added for free.
What Makes a Good Hybrid Implementation
Index once, query twice. Build the BM25 index and the vector index in a single indexing pass. Each document gets tokenized for the inverted index and embedded for the vector index at the same time.
Both queries run in parallel. The BM25 lookup and the vector similarity search are independent. Run them concurrently and merge when both finish.
RRF merge is fast. It's arithmetic on rank positions — no matrix math, no model inference. The merge step adds microseconds.
The embedding model matters. Our benchmark showed a 76% spread between the best model (Nomic, MRR 0.754) and the worst (Snowflake, MRR 0.429). Choosing the right model is more important than tuning hyperparameters.
Chunking strategy affects semantic results. Chunks that are too small lose context — a function signature without its body is hard to embed meaningfully. Chunks that are too large dilute the signal with noise. We split by ## headings in documentation and by top-level definitions in code. The right boundary depends on your content.
BM25 doesn't need chunking. Full-text search works at the document level. The inverted index already handles term positions. Chunking is a semantic search concern because embedding models have limited context windows.
When to Use Each Mode
Hybrid is the default. But there are cases where a single mode is better:
BM25 only — when you know the exact identifier. Searching for resolve_index_dir or TCP_NODELAY should match the exact string. An embedding model might return results about "directory resolution" or "TCP configuration" that are conceptually related but not what you need.
Semantic only — when you're exploring and don't know the terminology. "How does the app handle network failures?" is a natural language question where exact keyword matching is unlikely to find the right code.
Hybrid — everything else. Most real searches benefit from both. "Error handling" should find files literally named error_handler.rs and files about exception management, retry logic, and fault tolerance.
What This Path Covered
This is the final lesson in the search engineering path. Here's what connects:
- How BM25 Works — the ranking function that scores keyword relevance using term frequency and inverse document frequency.
- How Semantic Search Works — how embeddings represent meaning as vectors and cosine similarity finds related content.
- How Hybrid Search Works (this lesson) — running both in parallel and merging with RRF.
Hybrid search is not a compromise between two imperfect approaches. It is strictly better than either one alone when implemented well. The cost — maintaining two indexes and running two queries — is small. The quality improvement is measurable. Our benchmark proves it.