What is Vocabulary Mismatch

Vocabulary mismatch is when a search query and a relevant document use different words for the same concept. The document has the answer, but keyword search can't find it because the words don't overlap.

What does it look like?

Query: "login error"
Relevant doc: "authentication failure handling"

Overlap: zero words
BM25 score: 0

The user is looking for exactly what this document covers. But BM25 compares exact terms, and "login" doesn't match "authentication." "Error" doesn't match "failure." The document scores zero and never appears in results.

This happens constantly in real codebases:

Query termDocument termSame concept?
loginauthenticationYes
errorexceptionYes
configsettingsYes
reporepositoryYes
bugdefectYes

Why keyword search can't fix this

You could try stemming ("running" becomes "run"), but that only handles word forms, not synonyms. You could try synonym dictionaries, but maintaining them is impractical — every domain has its own vocabulary, and new terms appear constantly.

The fundamental issue is that full-text search matches character sequences, not meaning. It has no concept of what words mean or which words are related.

How semantic search solves it

Semantic search converts text to vectors that represent meaning. "Login" and "authentication" produce similar vectors because they appear in similar contexts during embedding model training. The cosine similarity between them is high, so the match is found.

Why hybrid search exists

Vocabulary mismatch is the core reason hybrid search combines both approaches. Keyword search excels at exact matches — searching parseJSON should find that exact function. Semantic search excels at conceptual matches — searching "data serialization" should find code about JSON parsing. Neither alone covers both cases. Together, they do.