Concepts

Group Message Fan-Out

The scaling problem created when one group message expands into many recipient, device, receipt, push, and projection updates.

advanced4 min readUpdated unknownCapacityDataReliabilityOperationsTradeoffs
Recipient ExpansionFan-Out-On-WriteFan-Out-On-ReadLarge Group IsolationWrite Amplification

Concepts Covered

  • Group recipient expansion
  • Fan-out-on-write
  • Fan-out-on-read
  • Write amplification
  • Large group isolation
  • Hot conversations
  • Membership snapshots
  • Delivery backpressure

Definition

Group message fan-out is the process of expanding one group message into delivery work for every eligible recipient and device.

A one-to-one message usually has a small delivery surface. A group message can explode. One message sent to a group with 100,000 members may create recipient delivery tasks, device delivery tasks, push tasks, unread projection updates, receipt events, and notification fan-out.

This is why group messaging is one of the main scaling pressures in chat systems.

The Pain That Forces Fan-Out Strategy

The sender experiences one action:

send "hello" to group_99

The backend sees a distributed workload:

store canonical message
find eligible members
expand to devices
enqueue delivery
update inbox projections
update unread counts
send push notifications
track receipts

If all of that happens synchronously before acknowledging the sender, one large group can make sending slow or unreliable.

The system needs to separate the small source-of-truth write from the large fan-out workload.

Mental Model

The message is small. The blast radius is not.

1 logical message -> N recipients -> M devices -> many derived updates

Group fan-out is a write amplification problem. One user action creates many backend writes and background tasks.

The larger the group, the more the system must care about isolation, batching, retries, idempotency, and backpressure.

Canonical Message First

A common design is:

1. Store canonical message.
2. Acknowledge sender once durable.
3. Fan out delivery work asynchronously.

The canonical message record says the message exists. Fan-out workers decide how it reaches recipients.

This gives the system room to retry delivery, resume failed expansion, and isolate large group work without blocking the sender forever.

Fan-Out-On-Write

Fan-out-on-write creates recipient-specific records when the message is sent.

Benefits:

  • recipient reads are fast
  • offline queues are explicit
  • delivery state is easier to track per recipient
  • unread counters and inboxes can be precomputed

Costs:

  • large groups create massive write bursts
  • one hot group can starve normal traffic
  • retries can duplicate work without idempotency
  • membership changes must be evaluated carefully

Fan-out-on-write works well for small and medium groups, but it needs safeguards for very large groups.

Fan-Out-On-Read

Fan-out-on-read stores the group message once and computes recipient visibility when a user opens or syncs the conversation.

Benefits:

  • sending to huge groups is cheaper
  • storage amplification is lower
  • the canonical message is easier to manage

Costs:

  • reads become more complex
  • offline notifications still need separate handling
  • membership history must be queried correctly
  • unread counts may need approximations or derived projections

Many systems use a hybrid: eager fan-out for normal groups, lazy or batched fan-out for massive groups.

Membership Correctness

Group fan-out must answer:

Who was allowed to receive this message at the time it was sent?

This is harder than it sounds. Members can join, leave, be removed, change devices, or have privacy settings updated while fan-out is running.

The system may need a membership snapshot, sequence number, or time-bounded membership query so fan-out does not accidentally deliver messages to users who should not receive them.

Large Group Isolation

Large groups should not be allowed to harm one-to-one messaging.

Mitigations include:

  • separate worker pools for large groups
  • per-conversation fan-out rate limits
  • batch recipient expansion
  • queue priorities
  • bulkheads between normal delivery and large-group delivery
  • lazy inbox projection for huge groups
  • backpressure when group fan-out lag grows

This is operationally important. A celebrity group, public channel, or viral message can create enough delivery work to affect the whole platform if it shares the same unbounded worker pool.

Operational Reality

Important signals:

  • fan-out lag by conversation size
  • recipient expansion duration
  • delivery queue depth
  • retries per message
  • duplicate recipient records
  • worker saturation by group class
  • unread projection lag
  • push provider throttling

Failure modes:

  • one group creates a hot partition
  • fan-out retries duplicate recipient delivery records
  • delivery workers spend all capacity on large groups
  • unread counters drift because projection work lags
  • membership changes are applied incorrectly during fan-out
  • push notification tasks overwhelm provider limits

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.