Concepts

Backpressure

A system's ability to respond safely when downstream components cannot keep up with incoming work.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs
Queue LagLoad SheddingRetriesConsumer ThroughputDegradation

Concepts Covered

  • Queue lag
  • Consumer throughput
  • Load shedding
  • Batching
  • Retry storms
  • Priority and degradation
  • Overload signals
  • Recovery behavior

Definition

Backpressure is how a system responds when work arrives faster than downstream components can safely process it.

The key idea is not merely "slow things down." The key idea is to prevent overload from spreading until the system collapses.

A healthy system has a deliberate response when queues grow, workers fall behind, databases saturate, or external dependencies slow down.

The Pain That Forces Backpressure

Systems rarely fail because one request is too hard. They fail because too much work arrives at once and every layer keeps accepting more.

Example:

1. Traffic spike creates more events.
2. Queue depth grows.
3. Workers fall behind.
4. Producers keep publishing at full speed.
5. Responses become slower.
6. Clients and workers retry.
7. Retry traffic adds even more load.
8. Memory, database connections, or broker storage run out.

Without backpressure, overload becomes a feedback loop.

The Naive Version

A naive system accepts unlimited work:

API receives request
write event to queue
return success
workers process whenever they can

This looks resilient because the API stays fast. But if workers cannot catch up, the queue becomes an invisible debt pile.

Eventually the debt becomes user-visible:

  • stale counters
  • delayed notifications
  • missing analytics
  • exhausted broker storage
  • giant recovery backlogs
  • old work processed after it no longer matters

The queue did not remove the problem. It moved the problem.

Mental Model

Backpressure is the system saying:

I cannot safely absorb this much work at this speed.

The response might be to slow producers, scale consumers, drop optional work, batch more efficiently, prioritize critical flows, or degrade non-critical features.

The correct response depends on the product promise.

For messaging, accepted messages should not be dropped casually. For analytics, sampling or delayed processing may be acceptable. For typing indicators, dropping stale events is often fine.

Concrete Example: Likes

In an Instagram-style like system, the Like API may continue accepting likes while analytics workers fall behind.

The system needs to decide:

  • Should likes still be accepted?
  • Should analytics lag be allowed?
  • Should analytics events be sampled?
  • Should counter updates remain higher priority than analytics?
  • Should worker pools scale up?
  • Should very hot posts get isolated?

There is no universal answer. The right decision depends on what users expect and what the business can tolerate.

Common Techniques

TechniqueWhat it does
Queue limitsPrevents unbounded storage or memory growth
Rate limitingSlows producers before overload spreads
Worker autoscalingAdds processing capacity when downstreams can handle it
BatchingProcesses more work per unit of overhead
Load sheddingDrops lower-priority work
PrioritizationKeeps critical workflows moving first
BulkheadsIsolates overloaded workloads from healthy ones
Circuit breakersStops calling a failing dependency temporarily

Backpressure is strongest when these techniques work together. A queue limit without prioritization can drop important work. Autoscaling without downstream awareness can overload the database harder.

What Backpressure Guarantees

Backpressure can help preserve system stability under overload.

It can:

  • protect critical paths
  • make overload visible
  • prevent unbounded queue growth
  • reduce retry amplification
  • preserve partial service during incidents

It does not guarantee:

  • no dropped work
  • instant recovery
  • no user-visible delays
  • correct prioritization without product decisions
  • infinite capacity

Operational Reality

Operators should monitor:

  • queue depth
  • age of oldest queued item
  • consumer lag
  • worker throughput
  • retry rate
  • drop or shed rate
  • saturation of databases, brokers, and external APIs
  • latency by priority class
  • recovery time after a spike

Failure modes:

  • Retrying too aggressively and making overload worse.
  • Letting optional work block critical work.
  • Having no limit on queue growth.
  • Monitoring request success but ignoring consumer lag.
  • Treating all events as equally important.
  • Scaling workers into a downstream database that is already saturated.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Used In Systems

System studies where this idea appears in context.

Related Patterns

Reusable architecture moves built from these ideas.