Concepts
Backpressure
A system's ability to respond safely when downstream components cannot keep up with incoming work.
Concepts Covered
- Queue lag
- Consumer throughput
- Load shedding
- Batching
- Retry storms
- Priority and degradation
- Overload signals
- Recovery behavior
Definition
Backpressure is how a system responds when work arrives faster than downstream components can safely process it.
The key idea is not merely "slow things down." The key idea is to prevent overload from spreading until the system collapses.
A healthy system has a deliberate response when queues grow, workers fall behind, databases saturate, or external dependencies slow down.
The Pain That Forces Backpressure
Systems rarely fail because one request is too hard. They fail because too much work arrives at once and every layer keeps accepting more.
Example:
1. Traffic spike creates more events.
2. Queue depth grows.
3. Workers fall behind.
4. Producers keep publishing at full speed.
5. Responses become slower.
6. Clients and workers retry.
7. Retry traffic adds even more load.
8. Memory, database connections, or broker storage run out.
Without backpressure, overload becomes a feedback loop.
The Naive Version
A naive system accepts unlimited work:
API receives request
write event to queue
return success
workers process whenever they can
This looks resilient because the API stays fast. But if workers cannot catch up, the queue becomes an invisible debt pile.
Eventually the debt becomes user-visible:
- stale counters
- delayed notifications
- missing analytics
- exhausted broker storage
- giant recovery backlogs
- old work processed after it no longer matters
The queue did not remove the problem. It moved the problem.
Mental Model
Backpressure is the system saying:
I cannot safely absorb this much work at this speed.
The response might be to slow producers, scale consumers, drop optional work, batch more efficiently, prioritize critical flows, or degrade non-critical features.
The correct response depends on the product promise.
For messaging, accepted messages should not be dropped casually. For analytics, sampling or delayed processing may be acceptable. For typing indicators, dropping stale events is often fine.
Concrete Example: Likes
In an Instagram-style like system, the Like API may continue accepting likes while analytics workers fall behind.
The system needs to decide:
- Should likes still be accepted?
- Should analytics lag be allowed?
- Should analytics events be sampled?
- Should counter updates remain higher priority than analytics?
- Should worker pools scale up?
- Should very hot posts get isolated?
There is no universal answer. The right decision depends on what users expect and what the business can tolerate.
Common Techniques
| Technique | What it does |
|---|---|
| Queue limits | Prevents unbounded storage or memory growth |
| Rate limiting | Slows producers before overload spreads |
| Worker autoscaling | Adds processing capacity when downstreams can handle it |
| Batching | Processes more work per unit of overhead |
| Load shedding | Drops lower-priority work |
| Prioritization | Keeps critical workflows moving first |
| Bulkheads | Isolates overloaded workloads from healthy ones |
| Circuit breakers | Stops calling a failing dependency temporarily |
Backpressure is strongest when these techniques work together. A queue limit without prioritization can drop important work. Autoscaling without downstream awareness can overload the database harder.
What Backpressure Guarantees
Backpressure can help preserve system stability under overload.
It can:
- protect critical paths
- make overload visible
- prevent unbounded queue growth
- reduce retry amplification
- preserve partial service during incidents
It does not guarantee:
- no dropped work
- instant recovery
- no user-visible delays
- correct prioritization without product decisions
- infinite capacity
Operational Reality
Operators should monitor:
- queue depth
- age of oldest queued item
- consumer lag
- worker throughput
- retry rate
- drop or shed rate
- saturation of databases, brokers, and external APIs
- latency by priority class
- recovery time after a spike
Failure modes:
- Retrying too aggressively and making overload worse.
- Letting optional work block critical work.
- Having no limit on queue growth.
- Monitoring request success but ignoring consumer lag.
- Treating all events as equally important.
- Scaling workers into a downstream database that is already saturated.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Patterns
Reusable architecture moves built from these ideas.