Embedding Drift: When (and How) to Re-Index Your Vector Store

A RAG system that launched with great retrieval will get worse, and almost nobody notices until users complain. There's no crash, no error log, no failing test — just answers that gradually drift from relevant to mediocre. The culprit is usually some flavor of embedding drift: the relationship between your vectors, your data, and your queries — the foundation under vector search — has shifted out from under the index you built six months ago.

"Drift" gets used loosely, so let's be precise, because the three kinds have completely different fixes. Conflating them is why teams either re-index constantly for no reason or never re-index when they desperately should. Once you can name what's drifting, the response is straightforward.

Three things people call "drift"

Data drift is the easy one: your corpus changes. New documents arrive, old ones get edited, products get discontinued. The embedding model is fine — the index just doesn't reflect current reality. This needs incremental upserts, not a full rebuild.

Query drift is sneakier: the distribution of what users ask shifts. You built a support bot around billing questions; six months later half the traffic is about a new feature your corpus barely covers. The vectors are unchanged, but the queries now land in sparse regions of your embedding space. The fix is content, not re-embedding — you have a coverage gap.

Model drift is the expensive one: you change the embedding model itself, or the provider silently updates it. Embeddings from two different models live in incompatible vector spaces — a cosine similarity between them is meaningless. This is the only kind that forces a full re-index, and it's the one people fear, so let's be clear about when it actually applies.

You cannot mix embeddings from different models in one index. If you change your embedding model — or the provider versions it — every vector must be regenerated. A "small model upgrade" is a full re-index, full stop. Pin your model version explicitly so this never happens by surprise.

Detect drift before users do

The cardinal sin is having no signal at all. You need a retrieval quality metric you track continuously, the same way you'd track p99 latency. Two practical approaches:

A golden eval set. Maintain a fixed set of representative queries, each with known-relevant document IDs. Run it on a schedule and measure recall@k and Mean Reciprocal Rank. When the numbers slide, something drifted — and you'll know before the support tickets pile up.

def recall_at_k(retriever, eval_set, k=5):
    hits = 0
    for query, relevant_ids in eval_set:
        retrieved = [doc.id for doc in retriever.search(query, k=k)]
        if any(rid in retrieved for rid in relevant_ids):
            hits += 1
    return hits / len(eval_set)
 
# Run nightly; alert if it drops below a baseline threshold.
score = recall_at_k(retriever, GOLDEN_QUERIES, k=5)
if score < 0.85:
    alert(f"Retrieval recall@5 dropped to {score:.2f} — investigate drift")

Monitor the query distribution. Embed incoming queries and track their distance to your existing corpus centroids over time. A rising fraction of queries that are far from any indexed content is the fingerprint of query drift — a coverage gap, not a model problem.

These two signals together tell you which drift you have: golden-set recall falling while the corpus is stable points at model or chunking issues; query-distance rising points at a content gap.

Re-indexing without downtime

When you do need to rebuild — almost always because of a model change — the dangerous move is mutating your live index in place. Half-old, half-new vectors in the same space produce nonsense rankings while the job runs. Use blue-green indexing instead: build the new index alongside the old one, validate it, then atomically switch reads over.

# 1. Build the new index under a versioned name — old index keeps serving.
new_index = f"docs_v{MODEL_VERSION}_{date}"
create_index(new_index, dim=NEW_MODEL_DIM)
 
for batch in iter_documents(batch_size=256):
    vectors = embed(batch.texts, model=NEW_MODEL)   # new model, new space
    upsert(new_index, ids=batch.ids, vectors=vectors, metadata=batch.meta)
 
# 2. Validate the new index against the golden set BEFORE cutover.
assert recall_at_k(retriever_for(new_index), GOLDEN_QUERIES) >= BASELINE
 
# 3. Atomically repoint the read alias. Rollback = repoint back.
swap_alias("docs_current", to=new_index)

The alias indirection is the whole trick: your application always reads from docs_current, and cutover is a single metadata flip. If the new index underperforms, you repoint the alias back to the old one in seconds. Most vector stores support this pattern; with pgvector you can lean on Postgres transactions and table swaps to get the same atomic guarantee.

Never re-embed only your documents and forget your queries. At query time you must embed the incoming query with the same model that produced the index. A mismatch here is a silent, total retrieval failure that passes every smoke test on cached results.

Incremental updates for data drift

For ordinary data drift you don't rebuild anything — you keep the index fresh with upserts. The trick is knowing what changed so you only re-embed deltas. Track a content hash per document and only re-embed when it moves:

import hashlib
 
def content_hash(text: str) -> str:
    return hashlib.sha256(text.encode()).hexdigest()
 
def sync_document(doc):
    new_hash = content_hash(doc.text)
    if get_stored_hash(doc.id) == new_hash:
        return  # unchanged — skip, save the embedding call
    vector = embed(doc.text, model=PINNED_MODEL)
    upsert(INDEX, id=doc.id, vector=vector, metadata={"hash": new_hash})

This keeps embedding cost proportional to change, not corpus size, and lets you run sync continuously. Tools like LlamaIndex ship ingestion pipelines with built-in document management and dedup hashing so you don't hand-roll this.

Choosing a cadence

Don't re-index on a calendar; re-index on a signal. Wire your golden-set scores and query-distance metrics into the same alerting you use for the rest of production. Incremental data sync runs continuously and cheaply. A full model re-index is a deliberate, planned migration you trigger when (a) you're intentionally upgrading the embedding model, or (b) your retrieval metrics degrade in a way new content can't explain. Everything else is noise.

Takeaways

"Drift" is three different problems — data, query, and model — with three different fixes; name yours first.
Only a model change forces a full re-index; data drift wants incremental upserts and query drift wants new content.
Track retrieval quality continuously with a golden eval set; treat recall@k like a production SLO.
Re-index blue-green behind a read alias so cutover and rollback are atomic and downtime-free.
Pin your embedding model version and always embed queries with the same model that built the index.

Embedding Drift: When (and How) to Re-Index Your Vector Store

Three things people call "drift"

Detect drift before users do

Re-indexing without downtime

Incremental updates for data drift

Choosing a cadence

Takeaways

Read next

Vector Search Explained: Dense vs Sparse vs Hybrid

What Is RAG? A Practical Guide to Retrieval-Augmented Generation

RAG Isn't Dead — But Your Chunking Strategy Probably Is