Patterns
Reconciliation Job
Periodically compare source-of-truth data with derived state and repair drift caused by missed, duplicated, or incorrectly processed updates.
Concepts Covered
- Source-of-truth comparison
- Projection drift
- Repair jobs
- Reconciliation windows
- Safe correction
- Operational confidence
- Drift metrics
- Rebuild safety
1. Intent
A Reconciliation Job detects and repairs differences between source-of-truth data and derived state.
It accepts a practical reality: derived projections can drift, so production systems need repair paths.
The intent is not to excuse sloppy write paths. The intent is to make long-running systems repairable when duplicate events, missed events, bugs, or manual operations create inconsistency.
2. The Problem Without This Pattern
Suppose a like count projection says:
post_like_counts = 997
but the edge store has:
1,000 active post_likes rows
Without reconciliation, the system may keep showing the wrong count forever.
The same problem appears in:
- unread message counts
- inbox projections
- analytics aggregates
- search indexes
- notification state
- sharded counter totals
Derived state is fast because it is precomputed. That speed has a cost: it can become wrong unless the system has a way to compare it against truth.
3. How The Pattern Works
A reconciliation job usually:
1. Selects a partition, object, or time window.
2. Reads source-of-truth data.
3. Reads the derived projection.
4. Compares expected and actual values.
5. Emits a correction or overwrites the projection.
6. Records metrics and audit logs.
The job may run continuously, periodically, or only during incidents.
Example for likes:
expected = count active post_likes where post_id = 42
actual = post_like_counts[42]
if expected != actual:
update post_like_counts[42] = expected
record correction
Large systems rarely scan everything at once. They reconcile by partition, key range, tenant, post ID, conversation ID, or time window.
4. When To Use It
Use reconciliation when:
- derived projections matter
- event delivery can duplicate or miss work
- counters or read models can drift
- correctness can be restored from a source of truth
- silent drift is worse than delayed repair
- the projection is user-visible or business-critical enough to audit
Good examples:
- like counts from like edges
- unread counts from messages and read cursors
- analytics rollups from raw events
- search index documents from source records
5. When Not To Use It
Reconciliation is not a replacement for reliable write paths.
Avoid relying only on repair if:
- wrong values are safety-critical
- repairs cannot determine the true value
- scanning source truth is too expensive without careful partitioning
- users cannot tolerate temporary inconsistency
- correction itself could violate business rules
Some workflows need stronger consistency up front. Payments, authorization, inventory reservation, and safety decisions may not be good candidates for "fix it later."
6. Data And Operational Model
Reconciliation needs:
- clear source of truth
- projection ownership
- partitioning strategy
- correction mechanism
- audit trail
- drift metrics
- retry policy
- safe scheduling
Operators should monitor:
- drift count
- repair count
- scan duration
- correction failures
- age of unreconciled data
- largest drift by key
- database load caused by reconciliation
The job should avoid overloading the source database. It may need rate limits, off-peak scheduling, read replicas, or incremental checkpoints.
7. Failure Modes
- Job reads stale source data.
- Job overloads the database.
- Correction logic is wrong.
- Repair fights with live updates.
- Drift is detected but not alerted.
- Job only samples and misses important objects.
- Reconciliation overwrites newer projection state with older computed state.
- No audit trail exists for what was corrected.
8. Tradeoffs
| Benefit | Cost |
|---|---|
| Repairs derived state | Extra read and compute load |
| Finds silent bugs | Needs careful scheduling |
| Improves confidence | Correction logic can be risky |
| Supports eventual consistency | Does not prevent initial drift |
| Enables operational repair | Requires clear source of truth |
Reconciliation is the repair story that makes eventually consistent projections trustworthy over time.
9. Related Systems And Concepts
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.