Patterns
Saga Pattern
Coordinate a multi-step distributed workflow using local transactions and compensating actions instead of one large distributed transaction.
Concepts Covered
- Distributed workflows
- Local transactions
- Compensation
- Orchestration
- Choreography
- Partial failure
- Workflow state
- Idempotent steps
1. Intent
The Saga Pattern coordinates a business workflow that spans multiple services without using one global distributed transaction.
Each step commits locally. If a later step fails, compensating actions try to undo or offset earlier steps.
The intent is to make long-running distributed workflows explicit and recoverable when one atomic transaction across every system is unavailable, undesirable, or too expensive.
2. The Problem Without This Pattern
Some workflows touch multiple systems:
reserve inventory
charge payment
create order
send confirmation
A single database transaction cannot easily cover all of these if they live in separate services.
If step three fails after steps one and two succeed, the system needs a recovery plan:
inventory reserved
payment charged
order creation fails
Without a saga, the system can get stuck in a partial state. The user may be charged without an order, inventory may remain reserved forever, or operators may need manual repair.
3. How The Pattern Works
A saga is a sequence of local transactions:
T1 -> T2 -> T3
with compensations:
C3 -> C2 -> C1
If T3 fails, the saga may run C2 and C1 to compensate for earlier commits.
Two common styles:
| Style | Description |
|---|---|
| Orchestration | A central coordinator tells each service what to do |
| Choreography | Services react to events and trigger the next step |
Orchestration is easier to visualize because one workflow controller owns the state machine. Choreography can reduce central coordination but becomes harder to reason about as more services react to each other.
4. When To Use It
Use sagas when:
- a workflow spans multiple services
- each service owns its own data
- distributed transactions are unavailable or undesirable
- eventual consistency is acceptable
- compensating actions are meaningful
- the workflow has clear business steps
- operators need visibility into partial progress
Good examples:
- order placement across inventory, payment, and shipping
- account onboarding across several systems
- travel booking across hotel, flight, and payment providers
- long-running fulfillment workflows
5. When Not To Use It
Avoid sagas when:
- the workflow can stay inside one database transaction
- compensating actions are impossible
- the business requires strict atomicity
- the process is too simple to justify workflow machinery
- intermediate states cannot be exposed or hidden safely
Compensation is not always true undo. Refunding a payment is not the same as never charging it. Sending a cancellation email is not the same as never sending the original confirmation.
6. Data And Operational Model
Sagas need durable workflow state:
saga_id
current_step
status
attempt_count
last_error
created_at
updated_at
completed_steps
compensation_status
Each step should be idempotent. The coordinator or event handler may retry a step after timeout, and the receiving service must not duplicate side effects.
Operators should monitor:
- stuck sagas
- compensation failures
- retry counts
- workflow duration
- step failure rate
- sagas waiting on external systems
- manual intervention count
7. Failure Modes
- Compensation fails.
- Steps are not idempotent.
- Choreography becomes hard to understand.
- Orchestrator becomes a bottleneck.
- Workflow state is lost or inconsistent.
- Users see confusing intermediate states.
- A compensation creates a new failure that needs its own recovery path.
- Events are published without matching local state changes.
8. Tradeoffs
| Benefit | Cost |
|---|---|
| Avoids distributed transactions | Eventual consistency |
| Fits service-owned data | Complex failure handling |
| Makes long workflows explicit | Compensation can be hard |
| Supports retries and recovery | Requires workflow observability |
| Works across external systems | Intermediate states must be managed |
Sagas are not simpler than transactions. They are a way to make distributed business workflows explicit when one transaction cannot safely own everything.
9. Related Systems And Concepts
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.