Patterns

Reconciliation Job

Periodically compare source-of-truth data with derived state and repair drift caused by missed, duplicated, or incorrectly processed updates.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs
Projection DriftSource Of TruthRepair JobsEvent Replay

Concepts Covered

  • Source-of-truth comparison
  • Projection drift
  • Repair jobs
  • Reconciliation windows
  • Safe correction
  • Operational confidence
  • Drift metrics
  • Rebuild safety

1. Intent

A Reconciliation Job detects and repairs differences between source-of-truth data and derived state.

It accepts a practical reality: derived projections can drift, so production systems need repair paths.

The intent is not to excuse sloppy write paths. The intent is to make long-running systems repairable when duplicate events, missed events, bugs, or manual operations create inconsistency.

2. The Problem Without This Pattern

Suppose a like count projection says:

post_like_counts = 997

but the edge store has:

1,000 active post_likes rows

Without reconciliation, the system may keep showing the wrong count forever.

The same problem appears in:

  • unread message counts
  • inbox projections
  • analytics aggregates
  • search indexes
  • notification state
  • sharded counter totals

Derived state is fast because it is precomputed. That speed has a cost: it can become wrong unless the system has a way to compare it against truth.

3. How The Pattern Works

A reconciliation job usually:

1. Selects a partition, object, or time window.
2. Reads source-of-truth data.
3. Reads the derived projection.
4. Compares expected and actual values.
5. Emits a correction or overwrites the projection.
6. Records metrics and audit logs.

The job may run continuously, periodically, or only during incidents.

Example for likes:

expected = count active post_likes where post_id = 42
actual = post_like_counts[42]
if expected != actual:
  update post_like_counts[42] = expected
  record correction

Large systems rarely scan everything at once. They reconcile by partition, key range, tenant, post ID, conversation ID, or time window.

4. When To Use It

Use reconciliation when:

  • derived projections matter
  • event delivery can duplicate or miss work
  • counters or read models can drift
  • correctness can be restored from a source of truth
  • silent drift is worse than delayed repair
  • the projection is user-visible or business-critical enough to audit

Good examples:

  • like counts from like edges
  • unread counts from messages and read cursors
  • analytics rollups from raw events
  • search index documents from source records

5. When Not To Use It

Reconciliation is not a replacement for reliable write paths.

Avoid relying only on repair if:

  • wrong values are safety-critical
  • repairs cannot determine the true value
  • scanning source truth is too expensive without careful partitioning
  • users cannot tolerate temporary inconsistency
  • correction itself could violate business rules

Some workflows need stronger consistency up front. Payments, authorization, inventory reservation, and safety decisions may not be good candidates for "fix it later."

6. Data And Operational Model

Reconciliation needs:

  • clear source of truth
  • projection ownership
  • partitioning strategy
  • correction mechanism
  • audit trail
  • drift metrics
  • retry policy
  • safe scheduling

Operators should monitor:

  • drift count
  • repair count
  • scan duration
  • correction failures
  • age of unreconciled data
  • largest drift by key
  • database load caused by reconciliation

The job should avoid overloading the source database. It may need rate limits, off-peak scheduling, read replicas, or incremental checkpoints.

7. Failure Modes

  • Job reads stale source data.
  • Job overloads the database.
  • Correction logic is wrong.
  • Repair fights with live updates.
  • Drift is detected but not alerted.
  • Job only samples and misses important objects.
  • Reconciliation overwrites newer projection state with older computed state.
  • No audit trail exists for what was corrected.

8. Tradeoffs

BenefitCost
Repairs derived stateExtra read and compute load
Finds silent bugsNeeds careful scheduling
Improves confidenceCorrection logic can be risky
Supports eventual consistencyDoes not prevent initial drift
Enables operational repairRequires clear source of truth

Reconciliation is the repair story that makes eventually consistent projections trustworthy over time.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.