Concepts

Offline Delivery

The mechanisms that let disconnected users receive accepted messages later through durable logs, cursors, queues, and sync.

intermediate4 min readUpdated unknownDataReliabilityOperationsTradeoffs
Durable Message LogDevice CursorSync CheckpointPush HandoffDelivery Backlog

Concepts Covered

  • Offline users
  • Durable message history
  • Device cursors
  • Per-device queues
  • Sync checkpoints
  • Push notification handoff
  • Backlog growth
  • Reconnect storms

Definition

Offline delivery is the part of a messaging system that lets users receive messages after they were disconnected.

This is not an edge case. In mobile messaging, offline behavior is normal. Phones lose signal, switch networks, enter low-power modes, close background connections, or sit unused for days.

A serious messaging system cannot assume the recipient is online when the sender presses send.

The core rule is simple: if a message was accepted durably, recipient devices need a way to discover it later.

The Pain That Forces Offline Delivery

Realtime delivery feels like the whole product when both users are online.

sender -> gateway -> recipient device

But that path fails whenever the recipient is not reachable:

  • phone is off
  • app is suspended
  • device changed networks
  • gateway connection dropped
  • user has multiple devices and only some are online
  • push notification is delayed

A fragile system tries to push the message once and then forgets it. That loses messages when the recipient disconnects at the wrong time.

A durable system writes the message first, then attempts realtime delivery:

accepted message -> durable message log -> realtime push if online
                                      -> offline sync if not online

The durable message log is the recovery path. The gateway is an optimization.

Mental Model

Offline delivery is not "store a push notification."

It is:

store durable message history
track what each device has seen
let devices ask for what they missed

Push can wake a device, but sync delivers the truth.

Cursors And Checkpoints

A common offline sync model uses cursors. A device remembers the last server sequence it received for each conversation.

conversation_id -> last_received_sequence
c_10            -> 84211
c_44            -> 12008

When the device reconnects, it asks:

give me messages after sequence 84211 in conversation c_10

The server returns missing messages, subject to membership and retention rules.

This design works well when messages are stored in a queryable conversation log. It makes recovery understandable: the question becomes "what has this device already seen?"

Per-Device Queue Versus Message Log Sync

There are two common models.

ModelHow it worksTradeoff
Per-device queueCreate pending delivery rows for each devicePrecise but can create lots of state
Message log syncDevice reads from canonical message history using cursorsEfficient but needs good query and membership logic

Many production designs combine both. They keep durable message history as truth, then maintain delivery records or projections for product-specific state such as delivered receipts, unread counts, and push tasks.

Push Is Not The Source Of Truth

Push notifications are a wake-up mechanism, not the message database.

If the recipient is offline, the system may send a push notification through an external provider. That provider may delay, drop, throttle, or collapse notifications. The app should still sync from the backend when opened.

This matters for reliability. A failed push should not mean the message is lost. It usually means the user might not be notified immediately, but the message remains available when the device reconnects.

Offline Backlog

Backlog grows when users stay offline or when delivery workers fall behind.

Important questions:

  • How much history is retained?
  • Are messages paginated during sync?
  • What happens when a user rejoins after months?
  • Can the app sync conversation summaries before full message history?
  • Are large media files downloaded automatically or lazily?
  • How does sync resume after partial failure?

The system should avoid dumping an enormous backlog to a reconnecting device all at once. Sync needs pagination, prioritization, and resumability.

Operational Reality

Important signals:

  • sync request rate
  • oldest undelivered message age
  • backlog size by user and device
  • cursor advancement rate
  • duplicate delivery count
  • reconnect storm volume
  • sync pagination latency
  • push-to-sync conversion rate
  • messages unavailable due to retention

Failure modes:

  • A message is pushed but not durably stored.
  • A device reconnects and receives duplicates because cursors are wrong.
  • A device misses messages because membership history was not checked correctly.
  • Offline queues grow without bounds.
  • Push notification retries overload an external provider.
  • A reconnect storm causes every client to run expensive sync at once.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.