Concepts

Distributed Counters

Counter designs that avoid a single hot bottleneck when many machines update or read the same logical count.

intermediate4 min readUpdated unknownCapacityDataReliabilityTradeoffs
Sharded CountersHot RowsEventual ConsistencyCounter RepairWrite Contention

Concepts Covered

  • Logical counters
  • Hot counter rows
  • Write contention
  • Sharded counters
  • Compacted totals
  • Counter drift
  • Repair jobs
  • Source truth vs aggregate projections

Definition

A distributed counter represents one logical count while storing or updating that count across multiple rows, shards, partitions, or machines.

The purpose is to avoid forcing every update through one bottleneck.

Examples:

  • likes on a viral post
  • views on a popular video
  • unread messages in a busy conversation
  • active users in a region
  • downloads of a popular file

The Pain That Forces Distributed Counters

The simplest counter is one row:

post_id -> like_count
post_42 -> 10001

Every new like updates the same value:

UPDATE post_counts
SET like_count = like_count + 1
WHERE post_id = 'post_42';

For normal posts, this may be fine. For viral posts, the same row receives a huge number of concurrent writes. The database must coordinate those updates, lock or serialize changes, replicate them, and keep the value correct.

The counter row becomes hot.

The Naive Failure

Imagine a celebrity post receiving thousands of likes per second.

1. Every like tries to update post_42.like_count.
2. The row becomes locked or heavily contended.
3. Write latency increases.
4. Client retries begin.
5. Retries add more writes to the same hot row.
6. The counter path slows or fails.

The problem is not counting as a concept. The problem is forcing all writes for one popular object through one physical place.

Mental Model

A distributed counter splits one logical count into smaller physical pieces.

Instead of:

post_42 -> 10001

Use:

post_42, shard_0 -> 2300
post_42, shard_1 -> 2601
post_42, shard_2 -> 2550
post_42, shard_3 -> 2550

The logical count is the sum of the shards.

Sharded Counter Mechanics

Write path:

1. A like event arrives.
2. The system chooses a counter shard.
3. It increments only that shard.
4. Reads either sum shards or read a compacted total.

Shard selection can be random, hash-based, or based on writer identity. The goal is to spread writes across multiple physical rows or partitions.

Read path options:

Read strategyBenefitCost
Sum all shards on readFreshest aggregateMore read work
Maintain compacted totalFast readsTotal may lag
Cache aggregateVery fast readsCache invalidation and staleness

Source Truth Versus Counter Projection

For important relationships, the counter should usually not be the only truth.

In a like system:

source truth: user_id liked post_id
projection: post_id has N likes

The user-to-post edge answers whether a specific user liked a post. The counter answers the aggregate count quickly.

If the counter drifts, the system can rebuild it from source edges or events.

What Distributed Counters Guarantee

Distributed counters can increase write throughput and reduce hot-row contention.

They do not automatically guarantee:

  • perfectly fresh counts
  • no drift
  • no duplicate increments
  • no missed decrements
  • cheap reads
  • simple repair

Most distributed counters are eventually consistent projections. They need reconciliation.

Operational Reality

Watch:

  • writes per counter key
  • shard imbalance
  • read latency for aggregate counts
  • compacted total lag
  • counter drift
  • reconciliation corrections
  • retry and duplicate event rate
  • shard-count migration progress

Failure modes:

  • Duplicate events inflate counts.
  • Missed events undercount.
  • Hot posts overwhelm one shard if shard selection is poor.
  • Compacted totals lag behind shard updates.
  • Repair jobs disagree with live projections.
  • Reads become expensive because too many shards must be summed.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.