Search Engineering FAQ
What is the difference between BM25 and semantic search?
BM25 matches exact keywords using term frequency and inverse document frequency. Semantic search matches meaning using neural network embeddings. BM25 finds "authentication" when you search "authentication." Semantic search finds "login handler" too, because it understands the concepts are related. Neither approach is strictly better — BM25 is precise for exact terms and identifiers, semantic search handles paraphrasing, synonyms, and natural language questions. BM25 needs no model, no GPU, and runs in under a millisecond. Semantic search requires an embedding model and takes longer, but finds results that share no keywords with the query. Most production systems use both together as hybrid search.
See How BM25 Works and How Semantic Search Works for the full walkthroughs.
Is BM25 still relevant in 2026?
Yes. BM25 is fast, requires no GPU, and excels at exact matches — function names, error codes, identifiers. It scored MRR 0.727 in our benchmark on 825 real files, better than 4 of 6 embedding models we tested. Elasticsearch, Tantivy, Apache Lucene, and most search infrastructure still use BM25 as the primary ranking function. The best approach for most applications is hybrid search: BM25 + semantic combined, where BM25 handles precision and embeddings handle recall.
See BM25 vs Semantic Search — We Benchmarked 6 Models for the full data.
What is the difference between BM25 and TF-IDF?
BM25 improves on TF-IDF with two key changes: term frequency saturation (the 10th mention of a word matters less than the 1st) and document length normalization (long documents don't unfairly dominate short ones). Both use IDF to weigh rare words higher. The practical difference is that TF-IDF gives linearly increasing scores as term frequency grows, which lets long, repetitive documents dominate results. BM25's saturation curve caps this effect — after a term appears several times, additional mentions barely change the score. BM25 also normalizes for document length using a parameter called b, so a 10-line file mentioning "error" 3 times can outrank a 1000-line file mentioning it 5 times. TF-IDF was the standard in the 1970s-1990s; BM25 replaced it and remains the default in Elasticsearch, Lucene, and Tantivy.
See How BM25 Works for the formula breakdown.
Which embedding model should I use?
It depends on your use case. Nomic-embed-text-v1.5 won our benchmark with MRR 0.754 on 825 real files, beating 5 other models including BGE-small, BGE-base, AllMiniLM, GTE-small, and Snowflake Arctic. It runs locally via ONNX Runtime with no API calls. For code search specifically, 768-dimension models outperformed 384-dimension ones in our tests — the extra dimensions capture more structural information about code. If you need the simplest possible setup, AllMiniLM-L6-v2 is widely supported but scored lower (MRR 0.612).
See BM25 vs Semantic Search — We Benchmarked 6 Models for model-by-model results.
How does hybrid search work?
Run BM25 and semantic search in parallel on the same query. Merge the two ranked lists using Reciprocal Rank Fusion (RRF). Documents appearing in both lists get boosted; documents in only one list still appear but rank lower. The result is better than either approach alone — in our benchmark, BM25 and Nomic each found 18 of 20 expected results, but not the same 18. BM25 caught exact identifier matches that semantic missed. Semantic caught conceptual matches that BM25 missed. Hybrid found both sets. Since BM25 runs in under 1ms and the semantic query takes ~187ms, running them in parallel means hybrid adds almost no latency over semantic search alone.
See How Hybrid Search Works for worked examples with real scores.
What is Reciprocal Rank Fusion (RRF)?
RRF combines multiple ranked lists into one by scoring each document as the sum of 1/(k + rank) across all lists. A document ranked #2 in both BM25 and semantic results beats a document ranked #1 in only one list — agreement between independent systems is a strong relevance signal. The constant k is typically 60, which dampens the difference between adjacent ranks so position 1 vs position 2 is a small gap, not a 2x gap. RRF's key advantage is that it uses only rank positions, not raw scores, so you never have to normalize BM25 scores against cosine similarity values.
See How Hybrid Search Works — the RRF section for a worked example.
What is MRR and how do I use it?
MRR (Mean Reciprocal Rank) measures how high the first correct result appears, averaged across a set of queries. MRR 1.0 means the right answer is always the first result. MRR 0.5 means it's typically second. To calculate it: for each query, find the position of the first correct result, take its reciprocal (1/position), then average across all queries. Use MRR to compare search configurations against each other — run the same queries through both setups and the one with higher MRR produces better rankings.
See BM25 vs Semantic Search — We Benchmarked 6 Models for MRR applied to a real evaluation.
What is the difference between Recall@K and MRR?
MRR measures where the first correct result appears — position matters. Recall@K measures whether the correct result appears in the top K at all — yes or no. Use MRR when ranking order matters, such as when a user reads the first result and ignores the rest. Use Recall@K when you just need the answer somewhere in the list, such as when a system processes all K results before acting. In our benchmark, BM25 had Recall@5 of 18/20 but MRR of 0.727 — meaning it found the right file 90% of the time, but not always in position 1.
See How Hybrid Search Works — The Evidence for both metrics applied side by side.
What is chunking and why does it matter?
Chunking splits large documents into smaller pieces before indexing. A 500-line file becomes 10 chunks of 50 lines. Without chunking, semantic search returns an entire file instead of pointing to the relevant section — and embedding models have limited context windows, so large files get truncated. Chunk size affects precision: too small loses context (a function signature without its body), too large dilutes the embedding with irrelevant content. Good chunk boundaries follow the structure of your content — split documentation by headings, split code by top-level definitions. BM25 does not need chunking — full-text search handles term positions at the document level through its inverted index.
See How Hybrid Search Works for how chunking strategy interacts with hybrid search.
Can I run embedding models locally without an API?
Yes. ONNX Runtime runs embedding models on CPU without external API calls. You download the model file once (typically 50-250 MB depending on the model) and run inference locally. Our benchmark runs entirely locally — no OpenAI, no cloud services. Nomic-embed-text-v1.5 indexes 825 files split into 7,492 chunks, with query embedding taking ~187ms on a MacBook. This makes it practical for offline-first tools and privacy-sensitive applications where sending code to an external API is not acceptable.
See How Semantic Search Works for how embedding models work under the hood.