Concepts

Projection Drift

When a derived read model or aggregate becomes inconsistent with the source-of-truth data it was computed from.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs
Derived ProjectionsReconciliationRepair JobsCounter DriftEvent Replay

Concepts Covered

  • Projection drift
  • Source of truth
  • Derived read models
  • Counter correctness
  • Event replay
  • Reconciliation
  • Repair jobs
  • Acceptable error

Definition

Projection drift happens when a derived view no longer matches the source-of-truth data it was computed from.

Example:

source of truth: 1,000 active like edges
counter projection: 997 likes

The counter has drifted by three.

A projection is any computed view used for reads: counters, timelines, dashboards, search indexes, inboxes, unread counts, analytics tables, or materialized views.

The Pain That Forces Drift Handling

Derived projections exist because reading from the source of truth every time can be too slow or expensive.

An Instagram-like system might store the real like relationship as:

likes(user_id, post_id)

But counting that table on every post view is expensive. So the system maintains a derived counter:

post_like_counts(post_id, count)

That makes reads fast, but it creates a new correctness question:

Does the count still match the real likes?

Over time, retries, missed events, duplicate events, failed workers, manual fixes, and code bugs can make the projection diverge from the truth.

Mental Model

The source of truth is the durable fact. The projection is a convenient view.

Projection drift is the gap between them.

truth -> events -> projection

Every arrow can fail. An event may not be published. A consumer may process it twice. A deployment may contain a bug. A backfill may skip a partition. A manual database correction may not emit the same event path.

If a system uses derived projections, it should also have a plan for detecting and repairing drift.

Why Drift Happens

Common causes:

  • missed events
  • duplicate events
  • consumer bugs
  • events applied out of order
  • partial outages
  • manual data changes
  • replaying old events with new logic
  • schema migrations that change meaning
  • race conditions between source writes and projection updates

Drift is not a rare edge case in long-running systems. It is a normal risk of maintaining multiple copies of meaning.

Example: Like Counter Drift

Suppose a user likes a post:

1. Insert like edge succeeds.
2. LikeCreated event is published.
3. Counter consumer increments count.

If step 2 fails, the source of truth has the like but the counter never increments.

If step 3 runs twice, the counter increments twice.

If a unlike event is processed before the like event, the counter may end up wrong depending on the implementation.

The user sees the projection, not the source table. So even small drift can damage trust if users notice impossible numbers.

Repair Strategies

StrategyIdeaTradeoff
Recompute from source tablesCount truth directly and overwrite projectionAccurate but expensive
Replay event logRebuild projection from historical eventsRequires reliable event history
Compare samplesCheck parts of the dataset for mismatchCheaper but incomplete
Incremental reconciliationRepair partitions, keys, or time windows graduallyNeeds scheduling and tracking
Dual calculationCompare old and new projection logic during migrationExtra cost

The repair method depends on the business cost of being wrong.

For like counts, small temporary drift may be acceptable. For payments, balances, inventory, or permissions, drift may be unacceptable and require stronger consistency boundaries.

Operational Reality

Important signals:

  • drift detected by reconciliation jobs
  • projection freshness
  • consumer error rate
  • duplicate event rate
  • event replay failures
  • repair job duration
  • number of repaired records
  • largest drift by key
  • stuck partitions

The key product question is: how wrong can this projection be, for how long, before it becomes unacceptable?

That answer determines whether the system needs strict transactions, frequent reconciliation, approximate counters, user-visible freshness labels, or manual repair tools.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.