Patterns
Circuit Breaker
Stop calling an unhealthy dependency for a period of time so failures do not cascade through the system.
Concepts Covered
- Cascading failures
- Open, closed, and half-open states
- Dependency protection
- Graceful degradation
- Failure thresholds
- Timeouts
- Fallbacks
- Recovery probes
1. Intent
The Circuit Breaker pattern prevents a service from repeatedly calling a dependency that appears unhealthy.
It gives the dependency time to recover and prevents callers from wasting resources on requests likely to fail.
The pattern is named after electrical circuit breakers. When the system detects danger, it opens the circuit and stops sending more load through the failing path.
2. The Problem Without This Pattern
If a dependency slows down, callers may pile up waiting connections, threads, memory, and retries.
Example:
1. Abuse reputation service becomes slow.
2. URL creation API keeps calling it.
3. Requests wait longer.
4. Worker threads and connections fill up.
5. Clients retry.
6. The URL creation API becomes unhealthy too.
This is how failures cascade: one slow dependency causes otherwise healthy services to exhaust their own resources.
A timeout helps limit each individual call. A circuit breaker goes further: it notices repeated failure and stops making the call for a while.
3. How The Pattern Works
A circuit breaker has states:
| State | Behavior |
|---|---|
| Closed | Calls pass through normally |
| Open | Calls fail fast or use fallback |
| Half-open | A small number of trial calls test recovery |
Basic flow:
closed:
call dependency
record success/failure
if failures cross threshold -> open
open:
fail fast or return fallback
wait cooldown period
transition to half-open
half-open:
allow a few trial calls
if healthy -> close
if failing -> open again
The breaker should be driven by real signals such as failure rate, timeout rate, latency, or connection errors.
4. When To Use It
Use circuit breakers around:
- remote service calls
- third-party APIs
- overloaded dependencies
- expensive synchronous calls
- optional product features
- push providers
- abuse, recommendation, or enrichment services
They are especially useful when the caller can provide a fallback, degrade behavior, or fail fast without corrupting state.
5. When Not To Use It
Do not use a circuit breaker as a replacement for:
- timeouts
- retries with backoff
- capacity planning
- dependency observability
- idempotency
It may be inappropriate for calls where failing fast is worse than waiting, such as a critical consistency check with no safe fallback.
Also be careful around write operations. If a circuit breaker hides whether a write succeeded, callers still need idempotency and reconciliation.
6. Data And Operational Model
Circuit breakers need:
- failure thresholds
- latency thresholds
- cooldown duration
- half-open trial count
- fallback behavior
- metrics by dependency
- alerting on state transitions
Operators should monitor:
- breaker state changes
- fallback rate
- dependency error rate
- dependency latency
- half-open success rate
- user-facing impact during open state
Fallbacks should be explicit. Returning stale cached data is different from rejecting a request. Skipping analytics is different from skipping payment authorization.
7. Failure Modes
- Breaker opens too aggressively and blocks healthy traffic.
- Breaker opens too late and does not prevent cascading failure.
- Fallback responses hide serious outages.
- Half-open probes overload a recovering dependency.
- Missing timeouts make breaker decisions slow.
- Breaker state is shared too broadly and blocks unrelated tenants.
- Breaker state is local only, so every instance probes recovery at once.
8. Tradeoffs
| Benefit | Cost |
|---|---|
| Reduces cascading failures | Adds stateful client behavior |
| Protects dependencies | Requires threshold tuning |
| Enables graceful degradation | Fallback quality matters |
| Fails fast under known outage | Can reject requests during recovery |
| Works well with backpressure | Can hide dependency pain if alerts are weak |
A circuit breaker is a reliability boundary. It should make failure explicit and contained, not invisible.
9. Related Systems And Concepts
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.