Concepts
Delivery Guarantees
The reliability contract that defines whether messages can be lost, duplicated, retried, acknowledged, or eventually delivered.
Concepts Covered
- At-most-once delivery
- At-least-once delivery
- Exactly-once semantics
- Durable acceptance
- Acknowledgements
- Retries
- Duplicate prevention
- User-visible delivery states
Definition
A delivery guarantee is the reliability contract a system makes about what can happen after work is accepted.
In a messaging system, this means answering questions like:
- Can an accepted message be lost?
- Can a recipient receive the same message twice?
- What does "sent" mean?
- What does "delivered" mean?
- Does the sender wait for every recipient device?
- What happens if a worker crashes halfway through delivery?
Delivery guarantees are not just implementation details. They shape user trust. A chat product can tolerate a receipt arriving late. It cannot casually lose a message after telling the sender it was accepted.
The Pain That Forces Delivery Guarantees
Naive messaging systems often mix several meanings into one word: "sent."
But a message moves through multiple stages:
client creates message
-> server accepts message
-> message is stored durably
-> delivery worker processes it
-> gateway pushes it
-> recipient device receives it
-> recipient reads it
If the UI shows one checkmark after the first network request returns, what does that checkmark actually mean? Did the server store the message? Did a recipient receive it? Did only a gateway see it? Did the message survive a crash?
Delivery guarantees force the system to define these boundaries clearly.
The Three Classic Guarantees
| Guarantee | Meaning | Common consequence |
|---|---|---|
| At-most-once | Try once; do not retry after uncertainty | Work can be lost |
| At-least-once | Retry until success is observed | Duplicates are possible |
| Exactly-once | Each logical operation affects the system once | Usually built from idempotency and deduplication |
"Exactly once" is often misunderstood. In distributed systems, networks fail and components retry. A practical design usually uses at-least-once delivery underneath, then adds idempotency so repeated attempts do not create repeated logical effects.
Accepted Is Not Delivered
A chat system should separate these states:
pending -> client has not received server acceptance
accepted -> server durably stored the message
delivered -> recipient account or device acknowledged receipt
read -> recipient viewed the message according to product rules
failed -> system could not accept or deliver under policy
The exact UI labels are product decisions, but the engineering boundary matters.
If the server says a message was accepted, the message should be durable enough to survive gateway crashes, worker retries, and recipient offline periods.
Why At-Least-Once Is Common
At-least-once delivery chooses retry over silent loss.
Suppose a delivery worker sends a message to a gateway and then crashes before recording success.
Did the device receive the message? The worker may not know.
Retrying is safer than giving up, but retrying can duplicate delivery unless the recipient side or delivery state is idempotent.
This is why delivery systems need stable identifiers:
message_id + recipient_user_id + device_id
That key lets the system recognize:
this delivery task already exists
this device already acknowledged this message
this retry is the same logical work
Acknowledgements
Acknowledgements are signals from one part of the system to another. They are not all equal.
Important acknowledgement types:
- Server acceptance acknowledgement to the sender.
- Gateway write acknowledgement from gateway to delivery worker.
- Device receipt acknowledgement from client to backend.
- Read acknowledgement from client to receipt service.
- Consumer checkpoint acknowledgement to a stream or queue.
A gateway saying "I wrote the event to the socket" is weaker than a device saying "I processed the message." A read receipt is different from a delivery receipt.
Precise language prevents false confidence.
Failure Modes
Common failures:
- The server acknowledges before durably storing the message.
- The client retries without an idempotency key and creates duplicates.
- The delivery worker retries push notifications without deduplication.
- The UI treats "sent to gateway" as "delivered to recipient."
- Poison messages retry forever and block a queue.
- Retries happen too aggressively and create overload.
Dead-letter queues are useful for delivery tasks that repeatedly fail. They prevent one bad item from blocking a whole queue, while giving operators a place to inspect and repair.
Operational Reality
Important signals:
- accepted message count
- delivery task retry rate
- duplicate delivery attempts
- acknowledgement latency
- undelivered backlog age
- dead-letter queue depth
- device receipt lag
- gateway push failures
- messages stuck in uncertain states
The product question is: what should the user believe at each state? The engineering job is to make that belief honest.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.