Concepts

Projection Drift

When a derived read model or aggregate becomes inconsistent with the source-of-truth data it was computed from.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs

Derived ProjectionsReconciliationRepair JobsCounter DriftEvent Replay

Concepts Covered

Projection drift
Source of truth
Derived read models
Counter correctness
Event replay
Reconciliation
Repair jobs
Acceptable error

Definition

Projection drift happens when a derived view no longer matches the source-of-truth data it was computed from.

Example:

source of truth: 1,000 active like edges
counter projection: 997 likes

The counter has drifted by three.

A projection is any computed view used for reads: counters, timelines, dashboards, search indexes, inboxes, unread counts, analytics tables, or materialized views.

The Pain That Forces Drift Handling

Derived projections exist because reading from the source of truth every time can be too slow or expensive.

An Instagram-like system might store the real like relationship as:

likes(user_id, post_id)

But counting that table on every post view is expensive. So the system maintains a derived counter:

post_like_counts(post_id, count)

That makes reads fast, but it creates a new correctness question:

Does the count still match the real likes?

Over time, retries, missed events, duplicate events, failed workers, manual fixes, and code bugs can make the projection diverge from the truth.

Mental Model

The source of truth is the durable fact. The projection is a convenient view.

Projection drift is the gap between them.

truth -> events -> projection

Every arrow can fail. An event may not be published. A consumer may process it twice. A deployment may contain a bug. A backfill may skip a partition. A manual database correction may not emit the same event path.

If a system uses derived projections, it should also have a plan for detecting and repairing drift.

Why Drift Happens

Common causes:

missed events
duplicate events
consumer bugs
events applied out of order
partial outages
manual data changes
replaying old events with new logic
schema migrations that change meaning
race conditions between source writes and projection updates

Drift is not a rare edge case in long-running systems. It is a normal risk of maintaining multiple copies of meaning.

Example: Like Counter Drift

Suppose a user likes a post:

1. Insert like edge succeeds.
2. LikeCreated event is published.
3. Counter consumer increments count.

If step 2 fails, the source of truth has the like but the counter never increments.

If step 3 runs twice, the counter increments twice.

If a unlike event is processed before the like event, the counter may end up wrong depending on the implementation.

The user sees the projection, not the source table. So even small drift can damage trust if users notice impossible numbers.

Repair Strategies

Strategy	Idea	Tradeoff
Recompute from source tables	Count truth directly and overwrite projection	Accurate but expensive
Replay event log	Rebuild projection from historical events	Requires reliable event history
Compare samples	Check parts of the dataset for mismatch	Cheaper but incomplete
Incremental reconciliation	Repair partitions, keys, or time windows gradually	Needs scheduling and tracking
Dual calculation	Compare old and new projection logic during migration	Extra cost

The repair method depends on the business cost of being wrong.

For like counts, small temporary drift may be acceptable. For payments, balances, inventory, or permissions, drift may be unacceptable and require stronger consistency boundaries.

Operational Reality

Important signals:

drift detected by reconciliation jobs
projection freshness
consumer error rate
duplicate event rate
event replay failures
repair job duration
number of repaired records
largest drift by key
stuck partitions

The key product question is: how wrong can this projection be, for how long, before it becomes unacceptable?

That answer determines whether the system needs strict transactions, frequent reconciliation, approximate counters, user-visible freshness labels, or manual repair tools.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Derived Projections Event Streams

Used In Systems

System studies where this idea appears in context.

Instagram Likes System

Related Concepts

Core ideas that connect to this topic.

Distributed Counters Idempotent Consumers

Related Patterns

Reusable architecture moves built from these ideas.

Reconciliation Job