What is Approximate Nearest Neighbor (ANN)

Approximate nearest neighbor (ANN) search finds the closest vectors to a query vector without comparing every single one in the index. It trades a tiny amount of accuracy for massive speed gains — typically finding 95-99% of the true nearest neighbors in a fraction of the time.

Why not just compare everything?

The brute-force approach calculates cosine similarity between your query vector and every document vector. This is called exact nearest neighbor search, and it works fine for small datasets:

1,000 documents × 768 dimensions = ~1ms     (fine)
100,000 documents                 = ~100ms   (noticeable)
10,000,000 documents              = ~10s     (unusable)

At scale, exact search is too slow. ANN algorithms use clever index structures to skip most comparisons while still finding the closest vectors.

How does it work?

ANN algorithms build an index that organizes vectors into a navigable structure during indexing time. At query time, the algorithm traverses this structure to quickly narrow down candidates. Two common approaches:

HNSW (Hierarchical Navigable Small World) — Builds a multi-layer graph where each vector connects to its nearby neighbors. At query time, the algorithm starts at a random entry point and greedily hops to closer neighbors, layer by layer. Like navigating a city by first picking the right neighborhood, then the right street, then the right building.

IVF (Inverted File Index) — Divides the vector space into clusters. At query time, it identifies which clusters are closest to the query, then only searches vectors within those clusters. Like checking only the relevant sections of a library instead of every shelf.

What's the tradeoff?

ANN might miss a vector that's truly the closest if it happens to be in a cluster or graph region the algorithm didn't explore. In practice, this matters less than it sounds. With proper tuning, ANN finds 95%+ of the true nearest neighbors while searching less than 1% of the total vectors.

You can control this tradeoff. Most ANN implementations expose a parameter (often called ef or nprobe) that controls how many candidates to examine. Higher values mean better accuracy but slower queries.

Why it matters for search

Every semantic search system at scale uses ANN. When you search a codebase and get results in milliseconds, it's because an ANN index is doing the heavy lifting — finding the most relevant embeddings without scanning every document.