How Semantic Search Works — Finding Meaning, Not Keywords

2026-03-21

Semantic search finds documents by meaning instead of matching keywords, using neural networks to convert text into vectors and comparing those vectors with cosine similarity.

In the previous lesson, we saw how BM25 scores documents by counting word occurrences. That works well when the user types the exact words in the document. But what happens when they don't?

The Problem BM25 Can't Solve

You have a codebase with a file called auth_middleware.rs. It contains functions for validating tokens, checking permissions, and rejecting unauthorized requests. Someone searches for "login." BM25 finds nothing — the word "login" never appears in that file.

The user and the document are talking about the same concept. But they're using different words, and BM25 only matches words.

This isn't a rare edge case. It happens constantly:

Searching "crash" when the document says "panic"
Searching "remove duplicates" when the function is called dedup
Searching "how to protect API endpoints" when the document is titled "Authentication Middleware"

The gap between the words someone uses and the words a document contains is called the vocabulary mismatch problem. Semantic search exists to close that gap.

The Core Idea

Instead of comparing words, compare meanings. To do that, you need a way to represent meaning as something a computer can compare. That representation is a vector — a list of numbers.

The process has two parts:

A neural network converts text into a vector. This vector is called an embedding. The network has been trained so that text with similar meaning produces similar numbers.
You compare vectors using cosine similarity. Vectors that point in a similar direction get a high score. Vectors that point in different directions get a low score.

That's the entire idea. Everything else is detail.

Step by Step

Here's what happens when you build a semantic search system and run a query against it.

At index time

Every document in your corpus gets converted to a vector and stored. This is the indexing step.

"Authentication middleware for validating JWT tokens"
        │
        ▼
   Neural network
        │
        ▼
[0.021, -0.184, 0.337, 0.092, ..., -0.041]   ← 384 numbers

The output is a list of 384 numbers (or 768, or 1024, depending on the model). Each number captures some aspect of the text's meaning. No single number maps to a specific concept like "authentication" — the meaning is distributed across all the numbers together.

For long documents, you split the text into smaller pieces first. This is called chunking. Most embedding models work best on passages of a few hundred words, not entire documents.

At query time

The user's query goes through the same neural network to produce a vector.

"login"
        │
        ▼
   Same neural network
        │
        ▼
[0.019, -0.178, 0.341, 0.088, ..., -0.039]   ← 384 numbers

Notice that "login" and "authentication middleware for validating JWT tokens" produced similar vectors — because the model learned that these concepts are related.

Comparing vectors

Now you compare the query vector against every document vector using cosine similarity. This measures the angle between two vectors: identical direction = 1.0, perpendicular = 0.0, opposite = -1.0.

Document	Cosine similarity to "login"
"Authentication middleware for validating JWT tokens"	0.82
"Database connection pooling configuration"	0.14
"User session management and cookie handling"	0.71
"Sorting algorithms in Rust"	0.03

Rank by score, highest first. The authentication document wins — even though it never contains the word "login."

What the Model Actually Learns

The neural network isn't following hand-coded synonym lists. Nobody told it that "login" and "authentication" are related. It learned this from data.

During training, the model sees millions of text pairs that are known to be related — search queries paired with the documents people actually clicked, question-answer pairs, paraphrases of the same sentence. From these examples, the model adjusts its internal weights so that related text produces similar vectors and unrelated text produces different vectors.

After training, the model carries a compressed understanding of how words and concepts relate to each other. "Authentication" and "login" consistently appeared in similar contexts during training, so they end up near each other in vector space. "Authentication" and "sorting algorithm" did not, so they end up far apart.

This is what makes semantic search powerful and also what makes it fragile. The model can only represent relationships it saw during training.

Where Semantic Search Wins

Semantic search handles things that BM25 cannot:

Synonym matching. "Error" and "exception" and "panic" and "fault" all end up near each other in vector space. A search for any one of them finds documents containing the others.

Conceptual queries. Searching "how to protect API endpoints" finds a document titled "Rate Limiting and Authentication Middleware" — even with zero word overlap. The model understands the conceptual relationship between protection, rate limiting, and authentication.

Natural language questions. Users can search with full questions like "why does my connection drop after 30 seconds" and find documents about TCP keepalive timeouts and idle connection limits. BM25 would try to match "drop," "30," "seconds" — and likely miss the relevant documents entirely.

Cross-lingual potential. Multilingual embedding models can match a query in one language against documents in another, because both get mapped to the same vector space. This requires a model trained on multilingual data.

Where Semantic Search Loses

Semantic search has real weaknesses, and pretending otherwise leads to bad search systems.

Exact identifiers. If someone searches for resolve_index_dir, they want exactly that function — not something conceptually similar. The embedding model might treat this as a bag of sub-words ("resolve," "index," "dir") and return documents about resolving DNS or directory listings. BM25 matches the exact string.

Rare terms the model never saw. An internal code name like xk7_handler or a domain-specific acronym like PCBA probably never appeared in the model's training data. The model has no useful representation for it. It will produce a vector, but that vector won't capture the term's actual meaning — it'll just be noise.

Error codes and UUIDs. Searching for E0308 or 550e8400-e29b-41d4-a716-446655440000 is an exact-match problem. Semantic search adds nothing here and may actively hurt by returning conceptually related but wrong results.

Speed. BM25 uses an inverted index — a data structure built for fast lookup. Semantic search compares the query vector against every document vector in the index. Even with optimizations like approximate nearest neighbor search, it's slower than BM25, especially on large corpora.

Compute cost. Every query must pass through the neural network to produce a vector. Running inference on a neural network costs more CPU (or GPU) than tokenizing a query and looking up terms in an inverted index. Models exported to ONNX format can run efficiently on CPU, but the cost is still higher than BM25.

The Tradeoff

BM25 and semantic search have complementary strengths:

Capability	BM25	Semantic search
Exact string matching	Excellent	Poor
Synonym/concept matching	None	Good
Rare identifiers	Works perfectly	Unreliable
Natural language queries	Limited	Strong
Speed	Milliseconds	Tens of milliseconds
Infrastructure	Inverted index, no model	Neural network + vector index

Neither one is strictly better. BM25 is fast and precise when the user's words match the document's words. Semantic search understands meaning when they don't.

That's why production search systems don't choose one — they combine both. The query runs through BM25 and through semantic search in parallel. The two result lists are merged using Reciprocal Rank Fusion. This is called hybrid search, and it's the subject of the next lesson.

Next Steps

What is an Embedding? — how neural networks compress meaning into numbers
What is Cosine Similarity? — the math behind comparing vectors
How Hybrid Search Works — combining BM25 and semantic search for the best of both
BM25 vs Semantic Search — We Benchmarked 6 Models — real numbers on a real codebase

How Semantic Search Works — Finding Meaning, Not Keywords

The Problem BM25 Can't Solve

The Core Idea

Step by Step

At index time

At query time

Comparing vectors

What the Model Actually Learns

Where Semantic Search Wins

Where Semantic Search Loses

The Tradeoff

Next Steps

Prerequisites

Next

References

Referenced by

How Semantic Search Works — Finding Meaning, Not Keywords

The Problem BM25 Can't Solve

The Core Idea

Step by Step

At index time

At query time

Comparing vectors

What the Model Actually Learns

Where Semantic Search Wins

Where Semantic Search Loses

The Tradeoff

Next Steps

Prerequisites

Next

Related

References

Referenced by