Concepts
Offline Delivery
The mechanisms that let disconnected users receive accepted messages later through durable logs, cursors, queues, and sync.
Concepts Covered
- Offline users
- Durable message history
- Device cursors
- Per-device queues
- Sync checkpoints
- Push notification handoff
- Backlog growth
- Reconnect storms
Definition
Offline delivery is the part of a messaging system that lets users receive messages after they were disconnected.
This is not an edge case. In mobile messaging, offline behavior is normal. Phones lose signal, switch networks, enter low-power modes, close background connections, or sit unused for days.
A serious messaging system cannot assume the recipient is online when the sender presses send.
The core rule is simple: if a message was accepted durably, recipient devices need a way to discover it later.
The Pain That Forces Offline Delivery
Realtime delivery feels like the whole product when both users are online.
sender -> gateway -> recipient device
But that path fails whenever the recipient is not reachable:
- phone is off
- app is suspended
- device changed networks
- gateway connection dropped
- user has multiple devices and only some are online
- push notification is delayed
A fragile system tries to push the message once and then forgets it. That loses messages when the recipient disconnects at the wrong time.
A durable system writes the message first, then attempts realtime delivery:
accepted message -> durable message log -> realtime push if online
-> offline sync if not online
The durable message log is the recovery path. The gateway is an optimization.
Mental Model
Offline delivery is not "store a push notification."
It is:
store durable message history
track what each device has seen
let devices ask for what they missed
Push can wake a device, but sync delivers the truth.
Cursors And Checkpoints
A common offline sync model uses cursors. A device remembers the last server sequence it received for each conversation.
conversation_id -> last_received_sequence
c_10 -> 84211
c_44 -> 12008
When the device reconnects, it asks:
give me messages after sequence 84211 in conversation c_10
The server returns missing messages, subject to membership and retention rules.
This design works well when messages are stored in a queryable conversation log. It makes recovery understandable: the question becomes "what has this device already seen?"
Per-Device Queue Versus Message Log Sync
There are two common models.
| Model | How it works | Tradeoff |
|---|---|---|
| Per-device queue | Create pending delivery rows for each device | Precise but can create lots of state |
| Message log sync | Device reads from canonical message history using cursors | Efficient but needs good query and membership logic |
Many production designs combine both. They keep durable message history as truth, then maintain delivery records or projections for product-specific state such as delivered receipts, unread counts, and push tasks.
Push Is Not The Source Of Truth
Push notifications are a wake-up mechanism, not the message database.
If the recipient is offline, the system may send a push notification through an external provider. That provider may delay, drop, throttle, or collapse notifications. The app should still sync from the backend when opened.
This matters for reliability. A failed push should not mean the message is lost. It usually means the user might not be notified immediately, but the message remains available when the device reconnects.
Offline Backlog
Backlog grows when users stay offline or when delivery workers fall behind.
Important questions:
- How much history is retained?
- Are messages paginated during sync?
- What happens when a user rejoins after months?
- Can the app sync conversation summaries before full message history?
- Are large media files downloaded automatically or lazily?
- How does sync resume after partial failure?
The system should avoid dumping an enormous backlog to a reconnecting device all at once. Sync needs pagination, prioritization, and resumability.
Operational Reality
Important signals:
- sync request rate
- oldest undelivered message age
- backlog size by user and device
- cursor advancement rate
- duplicate delivery count
- reconnect storm volume
- sync pagination latency
- push-to-sync conversion rate
- messages unavailable due to retention
Failure modes:
- A message is pushed but not durably stored.
- A device reconnects and receives duplicates because cursors are wrong.
- A device misses messages because membership history was not checked correctly.
- Offline queues grow without bounds.
- Push notification retries overload an external provider.
- A reconnect storm causes every client to run expensive sync at once.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.