Concepts
Caching
Store frequently needed data closer to the request path so systems can reduce latency, absorb read pressure, and protect slower backing stores.
Concepts Covered
- Cache hits and misses
- Read pressure
- Latency reduction
- Time to live
- Cache invalidation
- Stale reads
- Hot keys
- Cache stampedes
Definition
Caching means storing data in a faster or closer layer so future requests do not always need to recompute the answer or read from the original source of truth.
The source of truth might be a database, object store, external API, or expensive computation. The cache is a performance layer in front of that source. It is usually faster because it keeps data in memory, keeps data geographically closer to users, or avoids repeated work.
Caching is not just an optimization. At scale, caching often becomes the difference between a system that survives normal traffic and a system that collapses under repeated reads.
The Pain That Forces Caching
Imagine a URL shortener redirect path:
GET /abc123
-> database lookup short_code = abc123
-> return destination URL
This is simple and correct. But popular links can receive thousands or millions of reads. If every redirect hits the database, the database becomes responsible for serving the same answer again and again.
The database is now doing repetitive work:
- parsing the query
- using an index
- reading pages from memory or disk
- returning the same destination URL
- handling connection pressure
The problem is not that the database is bad. The problem is that the system is asking the database to serve hot, repetitive reads on the critical path.
Caching moves that repeated answer closer to the redirect service:
GET /abc123
-> cache lookup
-> if hit, redirect immediately
-> if miss, read database and fill cache
Now the database only handles misses, writes, and less common reads.
Mental Model
A cache is a shortcut, not an authority.
The cache answers quickly, but the original data still belongs somewhere else. This is why caching always creates two questions:
- How do we get data into the cache?
- How do we keep cached data from becoming dangerously wrong?
If the data rarely changes, caching is relatively easy. A short URL mapping from short_code to destination may be immutable after creation, so stale reads are not a major concern.
If the data changes often, caching becomes harder. A user's profile, a product inventory count, or a post's like count can change while old values are still cached.
The more a cache improves performance, the more carefully you must reason about freshness.
Cache Hit And Cache Miss
A cache hit means the requested value is already in the cache.
short_code:abc123 -> https://example.com/article
A cache miss means the value is not present, so the service must go to the backing store.
cache miss
-> query database
-> return value
-> store value in cache for next time
The hit ratio is the percentage of requests served from the cache. A high hit ratio usually means the cache is protecting the database well. A low hit ratio may mean the cache is too small, entries expire too quickly, keys are poorly chosen, or traffic is too random to benefit.
Time To Live
Time to live, or TTL, controls how long a cache entry can remain valid before it expires.
cache key: short_code:abc123
value: https://example.com/article
ttl: 24 hours
TTL gives the system a simple freshness policy: use the cached value for a while, then refresh it.
Long TTLs improve cache efficiency but increase the risk of stale data. Short TTLs reduce staleness but increase database traffic because entries expire more often.
There is no universal correct TTL. The correct value depends on the product promise. A redirect destination may tolerate a long TTL. A stock count, payment state, or user permission may need a very short TTL or no cache at all.
Cache Invalidation
Invalidation means removing or updating cached data when the underlying data changes.
For example:
user updates profile name
-> write database
-> delete cache key user:42
-> next read reloads fresh data
Invalidation is difficult because writes and cache updates can fail independently. The database write might succeed, but the cache delete might fail. Or two requests might race: one request reads old data and fills the cache while another request writes new data.
This is why many systems combine invalidation with TTL. Even if invalidation fails, the stale value eventually expires.
Cache Stampede
A cache stampede happens when many requests miss the cache at the same time and all rush to the backing store.
Example:
1. Viral key expires from cache.
2. 20,000 requests arrive for the same key.
3. All see a cache miss.
4. All query the database.
5. The database overloads.
The cache was supposed to protect the database, but synchronized expiration turned it into a burst amplifier.
Common mitigations include:
- request coalescing so only one request rebuilds a missing value
- adding jitter to TTLs so many keys do not expire at once
- serving slightly stale values while refreshing in the background
- prewarming known hot keys
What Caching Does Not Solve
Caching does not remove the need for good data modeling, indexing, capacity planning, or backpressure.
It can hide pressure for a while, but if the cache becomes cold, invalid, overloaded, or unavailable, the backing store may suddenly receive traffic it has not been sized to handle.
Caching also does not make incorrect data correct. If the cached value is wrong, fast reads only return the wrong answer faster.
Operational Reality
Production caching is mostly about watching pressure and freshness.
Important signals:
- cache hit ratio
- cache miss rate
- p95 and p99 cache latency
- evictions
- memory usage
- hot keys
- backend database traffic during cache misses
- stale data incidents
- stampede events
The central tradeoff is simple: the cache buys speed and protection, but it introduces freshness, invalidation, and operational complexity.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Prerequisites
Read these first if this topic feels unfamiliar.
Used In Systems
System studies where this idea appears in context.
Related Concepts
Core ideas that connect to this topic.
Related Patterns
Reusable architecture moves built from these ideas.