Patterns

Saga Pattern

Coordinate a multi-step distributed workflow using local transactions and compensating actions instead of one large distributed transaction.

advanced4 min readUpdated unknownReliabilityOperationsTradeoffs
Local TransactionsCompensationEventual ConsistencyWorkflow State

Concepts Covered

  • Distributed workflows
  • Local transactions
  • Compensation
  • Orchestration
  • Choreography
  • Partial failure
  • Workflow state
  • Idempotent steps

1. Intent

The Saga Pattern coordinates a business workflow that spans multiple services without using one global distributed transaction.

Each step commits locally. If a later step fails, compensating actions try to undo or offset earlier steps.

The intent is to make long-running distributed workflows explicit and recoverable when one atomic transaction across every system is unavailable, undesirable, or too expensive.

2. The Problem Without This Pattern

Some workflows touch multiple systems:

reserve inventory
charge payment
create order
send confirmation

A single database transaction cannot easily cover all of these if they live in separate services.

If step three fails after steps one and two succeed, the system needs a recovery plan:

inventory reserved
payment charged
order creation fails

Without a saga, the system can get stuck in a partial state. The user may be charged without an order, inventory may remain reserved forever, or operators may need manual repair.

3. How The Pattern Works

A saga is a sequence of local transactions:

T1 -> T2 -> T3

with compensations:

C3 -> C2 -> C1

If T3 fails, the saga may run C2 and C1 to compensate for earlier commits.

Two common styles:

StyleDescription
OrchestrationA central coordinator tells each service what to do
ChoreographyServices react to events and trigger the next step

Orchestration is easier to visualize because one workflow controller owns the state machine. Choreography can reduce central coordination but becomes harder to reason about as more services react to each other.

4. When To Use It

Use sagas when:

  • a workflow spans multiple services
  • each service owns its own data
  • distributed transactions are unavailable or undesirable
  • eventual consistency is acceptable
  • compensating actions are meaningful
  • the workflow has clear business steps
  • operators need visibility into partial progress

Good examples:

  • order placement across inventory, payment, and shipping
  • account onboarding across several systems
  • travel booking across hotel, flight, and payment providers
  • long-running fulfillment workflows

5. When Not To Use It

Avoid sagas when:

  • the workflow can stay inside one database transaction
  • compensating actions are impossible
  • the business requires strict atomicity
  • the process is too simple to justify workflow machinery
  • intermediate states cannot be exposed or hidden safely

Compensation is not always true undo. Refunding a payment is not the same as never charging it. Sending a cancellation email is not the same as never sending the original confirmation.

6. Data And Operational Model

Sagas need durable workflow state:

saga_id
current_step
status
attempt_count
last_error
created_at
updated_at
completed_steps
compensation_status

Each step should be idempotent. The coordinator or event handler may retry a step after timeout, and the receiving service must not duplicate side effects.

Operators should monitor:

  • stuck sagas
  • compensation failures
  • retry counts
  • workflow duration
  • step failure rate
  • sagas waiting on external systems
  • manual intervention count

7. Failure Modes

  • Compensation fails.
  • Steps are not idempotent.
  • Choreography becomes hard to understand.
  • Orchestrator becomes a bottleneck.
  • Workflow state is lost or inconsistent.
  • Users see confusing intermediate states.
  • A compensation creates a new failure that needs its own recovery path.
  • Events are published without matching local state changes.

8. Tradeoffs

BenefitCost
Avoids distributed transactionsEventual consistency
Fits service-owned dataComplex failure handling
Makes long workflows explicitCompensation can be hard
Supports retries and recoveryRequires workflow observability
Works across external systemsIntermediate states must be managed

Sagas are not simpler than transactions. They are a way to make distributed business workflows explicit when one transaction cannot safely own everything.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.