Patterns

Transactional Outbox

Reliably publish events that correspond to committed database changes by storing publish intent inside the same local database transaction.

intermediate7 min readUpdated unknownDataReliabilityOperationsTradeoffs
Transaction BoundariesEvent StreamsIdempotent ConsumersAt-Least-Once Delivery

Concepts Covered

  • Local database transactions
  • Cross-system consistency gaps
  • Durable publish intent
  • Outbox tables
  • Background publishers
  • At-least-once delivery
  • Idempotent consumers
  • Outbox table retention

1. Intent

The Transactional Outbox pattern exists because a service usually cannot safely commit a database write and publish a Kafka or message-broker event as one atomic operation.

The database and the broker are different systems. The database has its own transaction log, commit rules, replication behavior, and failure modes. Kafka or another broker has its own acknowledgements, partitions, retries, and availability profile. Unless the system uses a distributed transaction protocol across both systems, there is no single transaction that can guarantee:

commit database row AND publish broker event atomically

That gap is the entire reason the outbox pattern exists.

The pattern changes the problem. Instead of trying to write to the database and publish to the broker in one unsafe request path, the service writes two rows inside one local database transaction:

  • the business state
  • the intent to publish an event about that business state

After the transaction commits, a separate background publisher reads the outbox row and publishes it to the broker.

2. The Problem Without This Pattern

Imagine an Instagram-style like service. A user taps the like button. The service needs to do two things:

  1. Store that the user liked the post.
  2. Publish a LikeCreated event so counters, notifications, ranking, and analytics can react.

A naive implementation might look like this:

insert like into database
publish LikeCreated event to Kafka
return success

The problem is the failure window between the database write and the broker publish.

Suppose the database insert succeeds, but the service crashes before publishing to Kafka. Now the like exists in the database, but downstream systems never hear about it. The counter does not update. The notification system does not know. Analytics misses the event. Ranking features may be stale.

The source-of-truth state changed, but the event-driven world did not receive the change.

Reversing the order is not safe either:

publish LikeCreated event to Kafka
insert like into database
return success

If the publish succeeds but the database insert fails, downstream systems react to a like that does not exist. The notification system might notify someone about a like that was never committed. A counter might increment even though the source edge is absent.

Both orders are broken because the system is trying to coordinate two different systems without one shared transaction.

3. How The Pattern Works

The Transactional Outbox pattern keeps the request path inside one transactional boundary: the local database.

Instead of publishing directly to Kafka during the request, the service writes an outbox row in the same database transaction as the business write.

BEGIN TRANSACTION;

INSERT INTO likes (
  user_id,
  post_id,
  created_at
) VALUES (
  'user_7',
  'post_42',
  now()
);

INSERT INTO outbox_events (
  event_id,
  aggregate_id,
  event_type,
  payload,
  created_at,
  processed_at
) VALUES (
  'evt_1001',
  'post_42',
  'LikeCreated',
  '{"user_id":"user_7","post_id":"post_42"}',
  now(),
  NULL
);

COMMIT;

Now the important guarantee is local and concrete:

If the like row commits, the outbox event row commits too.
If the transaction rolls back, neither row exists.

The outbox itself is not Kafka. It is a durable staging area inside the database that bridges the transactional database world and the asynchronous event-driven world.

After commit, a separate background publisher continuously reads unprocessed outbox rows:

select unpublished outbox rows
publish each event to Kafka
mark rows as processed
repeat

The full flow looks like this:

sequenceDiagram
  participant API as Like API
  participant DB as Database
  participant Publisher as Outbox Publisher
  participant Broker as Kafka / Broker
  participant Consumers as Consumers

  API->>DB: Begin transaction
  API->>DB: Insert like row
  API->>DB: Insert outbox event row
  API->>DB: Commit transaction
  Publisher->>DB: Read unprocessed outbox rows
  Publisher->>Broker: Publish LikeCreated
  Broker->>Consumers: Deliver event
  Publisher->>DB: Mark outbox row processed

The API no longer needs Kafka to be healthy at the exact moment the user likes a post. If Kafka is temporarily unavailable, the outbox rows remain in the database. The publisher can retry later.

4. When To Use It

Use the Transactional Outbox pattern when a committed database change must reliably produce an event.

Good use cases:

  • A like should emit events for counters, notifications, ranking, and analytics.
  • An order should emit OrderCreated for payment or fulfillment workflows.
  • A user signup should emit UserRegistered for email, onboarding, or CRM systems.
  • A message write should emit delivery work for asynchronous workers.
  • A payment state change should emit events for ledgers, receipts, or risk systems.

The pattern is useful when downstream systems would become incorrect if they silently missed committed changes.

It is especially important when:

  • the database is the source of truth
  • downstream projections are event-driven
  • direct broker publish happens outside the database transaction
  • losing events would create durable product inconsistency
  • retrying publication is acceptable

5. When Not To Use It

The pattern may be unnecessary if losing the event is acceptable.

For example, a best-effort analytics ping may not need an outbox if the product can tolerate occasional loss. A debug log event may not need this reliability either.

It may also be unnecessary if the event can be reconstructed cheaply from periodic scans. For example, a nightly batch job could rebuild a low-priority report from source tables.

Do not add an outbox just because the architecture uses events. Add it when the event is part of the correctness contract.

Also be honest about the operational cost. An outbox introduces:

  • an outbox table
  • a publisher process
  • retry behavior
  • monitoring
  • retention or cleanup
  • duplicate publish handling
  • schema evolution concerns

If nobody owns those operations, the outbox can become another reliability problem.

6. Data And Operational Model

A practical outbox row usually contains enough information to publish, retry, inspect, and clean up events.

outbox_events
- event_id
- aggregate_type
- aggregate_id
- event_type
- payload
- status
- created_at
- processed_at
- attempt_count
- next_attempt_at
- last_error

The publisher needs a safe way to claim work. If multiple publisher instances run, they must not all publish the same row at the same time unless duplicate publish is acceptable and consumers are idempotent.

Common publisher responsibilities:

  • read unprocessed rows
  • publish events to the broker
  • mark rows as processed after broker acknowledgement
  • retry transient failures with backoff
  • record permanent failures for inspection
  • expose lag and failure metrics
  • clean up or archive old processed rows

Important metrics:

  • oldest unprocessed outbox row age
  • number of unprocessed rows
  • publish success rate
  • publish failure rate
  • retry count
  • outbox table size
  • publisher lag
  • dead-lettered or stuck events

Outbox retention matters. If processed rows are never deleted or archived, the outbox table becomes a hidden storage and query-performance problem.

7. Failure Modes

The outbox pattern prevents lost publish intent, but it does not remove all failure modes.

Important failures:

  • The publisher publishes an event but crashes before marking the row processed.
  • The publisher retries and publishes the same event more than once.
  • Consumers are not idempotent and duplicate side effects appear.
  • The outbox table grows without retention.
  • Multiple publishers claim the same row without safe coordination.
  • Event payload schemas change in a way consumers cannot handle.
  • Publisher lag grows and downstream projections become stale.
  • The database becomes overloaded because outbox polling is too aggressive.

The most common misunderstanding is thinking the outbox gives exactly-once delivery everywhere. It does not.

The outbox guarantees that if the business transaction commits, the event intent is durably recorded. Publishing may still happen more than once. Consumers still need idempotency.

8. Tradeoffs

BenefitCost
Keeps business write and publish intent in one local transactionAdds an outbox table
Prevents committed changes from silently losing eventsAdds a publisher workflow
Decouples request success from broker availabilityEvents can be delayed
Makes publishing retryableConsumers must handle duplicates
Supports event-driven projectionsRequires retention and monitoring

The key tradeoff is latency versus reliability. Direct publishing may look simpler and faster, but it creates a dangerous consistency gap. The outbox adds an asynchronous step so the system can retry safely when the broker, network, or publisher fails.

For user-facing flows, this is usually the right tradeoff. The user action can commit quickly, and downstream systems can catch up.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Patterns

Reusable architecture moves built from these ideas.