Concepts

Backpressure

A system's ability to respond safely when downstream components cannot keep up with incoming work.

intermediate4 min readUpdated unknownReliabilityOperationsTradeoffs

Queue LagLoad SheddingRetriesConsumer ThroughputDegradation

Concepts Covered

Queue lag
Consumer throughput
Load shedding
Batching
Retry storms
Priority and degradation
Overload signals
Recovery behavior

Definition

Backpressure is how a system responds when work arrives faster than downstream components can safely process it.

The key idea is not merely "slow things down." The key idea is to prevent overload from spreading until the system collapses.

A healthy system has a deliberate response when queues grow, workers fall behind, databases saturate, or external dependencies slow down.

The Pain That Forces Backpressure

Systems rarely fail because one request is too hard. They fail because too much work arrives at once and every layer keeps accepting more.

Example:

1. Traffic spike creates more events.
2. Queue depth grows.
3. Workers fall behind.
4. Producers keep publishing at full speed.
5. Responses become slower.
6. Clients and workers retry.
7. Retry traffic adds even more load.
8. Memory, database connections, or broker storage run out.

Without backpressure, overload becomes a feedback loop.

The Naive Version

A naive system accepts unlimited work:

API receives request
write event to queue
return success
workers process whenever they can

This looks resilient because the API stays fast. But if workers cannot catch up, the queue becomes an invisible debt pile.

Eventually the debt becomes user-visible:

stale counters
delayed notifications
missing analytics
exhausted broker storage
giant recovery backlogs
old work processed after it no longer matters

The queue did not remove the problem. It moved the problem.

Mental Model

Backpressure is the system saying:

I cannot safely absorb this much work at this speed.

The response might be to slow producers, scale consumers, drop optional work, batch more efficiently, prioritize critical flows, or degrade non-critical features.

The correct response depends on the product promise.

For messaging, accepted messages should not be dropped casually. For analytics, sampling or delayed processing may be acceptable. For typing indicators, dropping stale events is often fine.

Concrete Example: Likes

In an Instagram-style like system, the Like API may continue accepting likes while analytics workers fall behind.

The system needs to decide:

Should likes still be accepted?
Should analytics lag be allowed?
Should analytics events be sampled?
Should counter updates remain higher priority than analytics?
Should worker pools scale up?
Should very hot posts get isolated?

There is no universal answer. The right decision depends on what users expect and what the business can tolerate.

Common Techniques

Technique	What it does
Queue limits	Prevents unbounded storage or memory growth
Rate limiting	Slows producers before overload spreads
Worker autoscaling	Adds processing capacity when downstreams can handle it
Batching	Processes more work per unit of overhead
Load shedding	Drops lower-priority work
Prioritization	Keeps critical workflows moving first
Bulkheads	Isolates overloaded workloads from healthy ones
Circuit breakers	Stops calling a failing dependency temporarily

Backpressure is strongest when these techniques work together. A queue limit without prioritization can drop important work. Autoscaling without downstream awareness can overload the database harder.

What Backpressure Guarantees

Backpressure can help preserve system stability under overload.

It can:

protect critical paths
make overload visible
prevent unbounded queue growth
reduce retry amplification
preserve partial service during incidents

It does not guarantee:

no dropped work
instant recovery
no user-visible delays
correct prioritization without product decisions
infinite capacity

Operational Reality

Operators should monitor:

queue depth
age of oldest queued item
consumer lag
worker throughput
retry rate
drop or shed rate
saturation of databases, brokers, and external APIs
latency by priority class
recovery time after a spike

Failure modes:

Retrying too aggressively and making overload worse.
Letting optional work block critical work.
Having no limit on queue growth.
Monitoring request success but ignoring consumer lag.
Treating all events as equally important.
Scaling workers into a downstream database that is already saturated.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Event Streams Rate Limiting

Used In Systems

System studies where this idea appears in context.

Instagram Likes System URL Shortener System WhatsApp-Style Messaging System

Related Patterns

Reusable architecture moves built from these ideas.

Bulkhead Isolation Circuit Breaker