Patterns

Circuit Breaker

Stop calling an unhealthy dependency for a period of time so failures do not cascade through the system.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs
Failure IsolationCascading FailuresTimeoutsDegradation

Concepts Covered

  • Cascading failures
  • Open, closed, and half-open states
  • Dependency protection
  • Graceful degradation
  • Failure thresholds
  • Timeouts
  • Fallbacks
  • Recovery probes

1. Intent

The Circuit Breaker pattern prevents a service from repeatedly calling a dependency that appears unhealthy.

It gives the dependency time to recover and prevents callers from wasting resources on requests likely to fail.

The pattern is named after electrical circuit breakers. When the system detects danger, it opens the circuit and stops sending more load through the failing path.

2. The Problem Without This Pattern

If a dependency slows down, callers may pile up waiting connections, threads, memory, and retries.

Example:

1. Abuse reputation service becomes slow.
2. URL creation API keeps calling it.
3. Requests wait longer.
4. Worker threads and connections fill up.
5. Clients retry.
6. The URL creation API becomes unhealthy too.

This is how failures cascade: one slow dependency causes otherwise healthy services to exhaust their own resources.

A timeout helps limit each individual call. A circuit breaker goes further: it notices repeated failure and stops making the call for a while.

3. How The Pattern Works

A circuit breaker has states:

StateBehavior
ClosedCalls pass through normally
OpenCalls fail fast or use fallback
Half-openA small number of trial calls test recovery

Basic flow:

closed:
  call dependency
  record success/failure
  if failures cross threshold -> open

open:
  fail fast or return fallback
  wait cooldown period
  transition to half-open

half-open:
  allow a few trial calls
  if healthy -> close
  if failing -> open again

The breaker should be driven by real signals such as failure rate, timeout rate, latency, or connection errors.

4. When To Use It

Use circuit breakers around:

  • remote service calls
  • third-party APIs
  • overloaded dependencies
  • expensive synchronous calls
  • optional product features
  • push providers
  • abuse, recommendation, or enrichment services

They are especially useful when the caller can provide a fallback, degrade behavior, or fail fast without corrupting state.

5. When Not To Use It

Do not use a circuit breaker as a replacement for:

  • timeouts
  • retries with backoff
  • capacity planning
  • dependency observability
  • idempotency

It may be inappropriate for calls where failing fast is worse than waiting, such as a critical consistency check with no safe fallback.

Also be careful around write operations. If a circuit breaker hides whether a write succeeded, callers still need idempotency and reconciliation.

6. Data And Operational Model

Circuit breakers need:

  • failure thresholds
  • latency thresholds
  • cooldown duration
  • half-open trial count
  • fallback behavior
  • metrics by dependency
  • alerting on state transitions

Operators should monitor:

  • breaker state changes
  • fallback rate
  • dependency error rate
  • dependency latency
  • half-open success rate
  • user-facing impact during open state

Fallbacks should be explicit. Returning stale cached data is different from rejecting a request. Skipping analytics is different from skipping payment authorization.

7. Failure Modes

  • Breaker opens too aggressively and blocks healthy traffic.
  • Breaker opens too late and does not prevent cascading failure.
  • Fallback responses hide serious outages.
  • Half-open probes overload a recovering dependency.
  • Missing timeouts make breaker decisions slow.
  • Breaker state is shared too broadly and blocks unrelated tenants.
  • Breaker state is local only, so every instance probes recovery at once.

8. Tradeoffs

BenefitCost
Reduces cascading failuresAdds stateful client behavior
Protects dependenciesRequires threshold tuning
Enables graceful degradationFallback quality matters
Fails fast under known outageCan reject requests during recovery
Works well with backpressureCan hide dependency pain if alerts are weak

A circuit breaker is a reliability boundary. It should make failure explicit and contained, not invisible.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Related Concepts

Core ideas that connect to this topic.

Related Patterns

Reusable architecture moves built from these ideas.