Concepts

Delivery Guarantees

The reliability contract that defines whether messages can be lost, duplicated, retried, acknowledged, or eventually delivered.

intermediate4 min readUpdated unknownModelingReliabilityOperationsTradeoffs
At-Most-OnceAt-Least-OnceExactly-Once IllusionAcknowledgementsIdempotency

Concepts Covered

  • At-most-once delivery
  • At-least-once delivery
  • Exactly-once semantics
  • Durable acceptance
  • Acknowledgements
  • Retries
  • Duplicate prevention
  • User-visible delivery states

Definition

A delivery guarantee is the reliability contract a system makes about what can happen after work is accepted.

In a messaging system, this means answering questions like:

  • Can an accepted message be lost?
  • Can a recipient receive the same message twice?
  • What does "sent" mean?
  • What does "delivered" mean?
  • Does the sender wait for every recipient device?
  • What happens if a worker crashes halfway through delivery?

Delivery guarantees are not just implementation details. They shape user trust. A chat product can tolerate a receipt arriving late. It cannot casually lose a message after telling the sender it was accepted.

The Pain That Forces Delivery Guarantees

Naive messaging systems often mix several meanings into one word: "sent."

But a message moves through multiple stages:

client creates message
  -> server accepts message
  -> message is stored durably
  -> delivery worker processes it
  -> gateway pushes it
  -> recipient device receives it
  -> recipient reads it

If the UI shows one checkmark after the first network request returns, what does that checkmark actually mean? Did the server store the message? Did a recipient receive it? Did only a gateway see it? Did the message survive a crash?

Delivery guarantees force the system to define these boundaries clearly.

The Three Classic Guarantees

GuaranteeMeaningCommon consequence
At-most-onceTry once; do not retry after uncertaintyWork can be lost
At-least-onceRetry until success is observedDuplicates are possible
Exactly-onceEach logical operation affects the system onceUsually built from idempotency and deduplication

"Exactly once" is often misunderstood. In distributed systems, networks fail and components retry. A practical design usually uses at-least-once delivery underneath, then adds idempotency so repeated attempts do not create repeated logical effects.

Accepted Is Not Delivered

A chat system should separate these states:

pending    -> client has not received server acceptance
accepted   -> server durably stored the message
delivered  -> recipient account or device acknowledged receipt
read       -> recipient viewed the message according to product rules
failed     -> system could not accept or deliver under policy

The exact UI labels are product decisions, but the engineering boundary matters.

If the server says a message was accepted, the message should be durable enough to survive gateway crashes, worker retries, and recipient offline periods.

Why At-Least-Once Is Common

At-least-once delivery chooses retry over silent loss.

Suppose a delivery worker sends a message to a gateway and then crashes before recording success.

Did the device receive the message? The worker may not know.

Retrying is safer than giving up, but retrying can duplicate delivery unless the recipient side or delivery state is idempotent.

This is why delivery systems need stable identifiers:

message_id + recipient_user_id + device_id

That key lets the system recognize:

this delivery task already exists
this device already acknowledged this message
this retry is the same logical work

Acknowledgements

Acknowledgements are signals from one part of the system to another. They are not all equal.

Important acknowledgement types:

  • Server acceptance acknowledgement to the sender.
  • Gateway write acknowledgement from gateway to delivery worker.
  • Device receipt acknowledgement from client to backend.
  • Read acknowledgement from client to receipt service.
  • Consumer checkpoint acknowledgement to a stream or queue.

A gateway saying "I wrote the event to the socket" is weaker than a device saying "I processed the message." A read receipt is different from a delivery receipt.

Precise language prevents false confidence.

Failure Modes

Common failures:

  • The server acknowledges before durably storing the message.
  • The client retries without an idempotency key and creates duplicates.
  • The delivery worker retries push notifications without deduplication.
  • The UI treats "sent to gateway" as "delivered to recipient."
  • Poison messages retry forever and block a queue.
  • Retries happen too aggressively and create overload.

Dead-letter queues are useful for delivery tasks that repeatedly fail. They prevent one bad item from blocking a whole queue, while giving operators a place to inspect and repair.

Operational Reality

Important signals:

  • accepted message count
  • delivery task retry rate
  • duplicate delivery attempts
  • acknowledgement latency
  • undelivered backlog age
  • dead-letter queue depth
  • device receipt lag
  • gateway push failures
  • messages stuck in uncertain states

The product question is: what should the user believe at each state? The engineering job is to make that belief honest.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.