DevgainsDevgainsDevgains
All articles

Vector Search Explained: Dense vs Sparse vs Hybrid

·5 min read·Updated Jun 29, 2026
Vector Search Explained: Dense vs Sparse vs Hybrid

Cover: gradient generated for Devgains

Vector search is how a retrieval system finds text by meaning instead of exact words: you turn each document into a vector, turn the query into a vector, and return the documents whose vectors sit closest. It's the retrieval engine underneath most RAG systems — and the stage where teams most often ship something that almost works.

The catch is that "find the nearest vectors" describes only dense search, which is one of three approaches. Understanding dense vs sparse vs hybrid is the difference between a demo that answers softball questions and a system that reliably finds the right chunk when a user types a product code or an error number.

Dense search: meaning, not tokens

A dense embedding model maps text to a fixed-length vector — say 1,536 floats — where semantically similar passages land near each other. "How do I reset my password?" and "steps to recover account access" share almost no words but produce nearby vectors, so dense search retrieves both. That semantic reach is the whole appeal.

q = embed("how do I reset my password")     # → [0.013, -0.21, ...] (1536 dims)
hits = vector_store.search(q, k=10)          # nearest neighbors by cosine distance

Similarity is usually cosine distance: the angle between vectors, ignoring their length. Two passages about the same topic point the same direction even if one is longer. At scale you don't compare against every vector — that's linear and slow — so stores use an approximate nearest-neighbor index. The dominant one is HNSW (hierarchical navigable small worlds), introduced by Malkov and Yashunin and built into pgvector, Qdrant, and most vector databases.

Where dense search fails is the exact-token long tail. Embeddings smear rare strings together: a model that never deeply learned ERR_2041 or useDeferredValue will place them somewhere vague, and your query for that exact token retrieves neighbors that merely feel related.

Sparse search: exact terms, decades of proof

Sparse search is keyword matching — the lineage that powers Elasticsearch and Postgres full-text search. Each document is a high-dimensional vector that is mostly zero, with non-zero weights only for the terms it actually contains. The scoring function almost everyone uses is BM25, formalized in Robertson and Zaragoza's "The Probabilistic Relevance Framework".

-- Postgres sparse search with BM25-style ranking over a tsvector
SELECT id, ts_rank_cd(search_vector, query) AS score
FROM chunks, plainto_tsquery('english', 'ERR_2041 rate limit') AS query
WHERE search_vector @@ query
ORDER BY score DESC
LIMIT 10;

BM25 nails what dense search fumbles: exact identifiers, error codes, names, acronyms. It rewards documents containing the query's rare terms and discounts common ones. Its blind spot is the mirror image of dense search's strength — it has no notion of meaning. "Reset password" and "recover account access" are, to BM25, unrelated.

Dense and sparse fail on opposite inputs. Dense misses exact tokens; sparse misses paraphrases. This is exactly why combining them isn't a nice-to-have — each one covers the other's blind spot.

Hybrid search: run both, then fuse

Hybrid search runs dense and sparse retrieval in parallel and merges the two ranked lists. The hard part is fusion: the scores aren't comparable — cosine distance and BM25 live on different scales — so you can't just add them. The standard fix is Reciprocal Rank Fusion (RRF), from Cormack, Clarke, and Buettcher, which ignores raw scores and combines ranks instead:

def rrf(dense_ids, sparse_ids, k=60):
    scores = {}
    for rank, doc_id in enumerate(dense_ids):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    for rank, doc_id in enumerate(sparse_ids):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    return sorted(scores, key=scores.get, reverse=True)

A document ranked highly by either retriever floats up; one ranked highly by both dominates. The constant k (commonly 60) dampens how much the very top ranks outweigh the rest. Because RRF needs only the ordering from each retriever, you can bolt it onto systems whose scores you can't normalize.

pgvector plus Postgres full-text search gives you both retrievers in one database, which is why hybrid no longer requires a separate vector store for most workloads.

Hybrid is not the finish line

Fusion gives you a high-recall candidate set — the right chunk is in the list. It does not guarantee the right chunk is at the top, and the model mostly reads the top few. That last-mile precision is the job of reranking with a cross-encoder: pull 20–50 candidates from hybrid search, then reorder them with a cross-encoder before generation. The full ordering — chunking, retrieval, reranking, eval — is laid out in the RAG guide and the deep dive on chunking strategies.

Default to hybrid + reranking. Pure dense search is the right baseline only when your corpus has no exact-token queries at all — and most real corpora do.

Watch for drift

Vector search assumes the index reflects today's data and today's embedding model. Swap the model, or let the corpus drift away from what you indexed, and recall decays without a single error in your logs. Knowing when to rebuild the index is its own skill, covered in embedding drift: when and how to re-index your vector store.

Takeaways

  • Dense search finds meaning and misses exact tokens; sparse (BM25) finds exact tokens and misses paraphrases. Their failures are mirror images.
  • Hybrid runs both and fuses the results, usually with reciprocal rank fusion, which combines ranks rather than incomparable scores.
  • Hybrid maximizes recall; a reranker is still what delivers precision at the top of the prompt.
  • You can do all of this in Postgres with pgvector plus full-text search — no separate vector database required to start.
  • Re-index when you change embedding models or the corpus drifts, or recall silently erodes.

Keep going in the AI Engineering cluster, and start from the RAG guide if you want the whole pipeline in order.

5 min read

Read next