Concepts

Caching

Store frequently needed data closer to the request path so systems can reduce latency, absorb read pressure, and protect slower backing stores.

foundation5 min readUpdated unknownCapacityReliabilityOperationsTradeoffs

Cache Hit RatioCache InvalidationTime To LiveHot KeysStale Reads

Concepts Covered

Cache hits and misses
Read pressure
Latency reduction
Time to live
Cache invalidation
Stale reads
Hot keys
Cache stampedes

Definition

Caching means storing data in a faster or closer layer so future requests do not always need to recompute the answer or read from the original source of truth.

The source of truth might be a database, object store, external API, or expensive computation. The cache is a performance layer in front of that source. It is usually faster because it keeps data in memory, keeps data geographically closer to users, or avoids repeated work.

Caching is not just an optimization. At scale, caching often becomes the difference between a system that survives normal traffic and a system that collapses under repeated reads.

The Pain That Forces Caching

Imagine a URL shortener redirect path:

GET /abc123
  -> database lookup short_code = abc123
  -> return destination URL

This is simple and correct. But popular links can receive thousands or millions of reads. If every redirect hits the database, the database becomes responsible for serving the same answer again and again.

The database is now doing repetitive work:

parsing the query
using an index
reading pages from memory or disk
returning the same destination URL
handling connection pressure

The problem is not that the database is bad. The problem is that the system is asking the database to serve hot, repetitive reads on the critical path.

Caching moves that repeated answer closer to the redirect service:

GET /abc123
  -> cache lookup
  -> if hit, redirect immediately
  -> if miss, read database and fill cache

Now the database only handles misses, writes, and less common reads.

Mental Model

A cache is a shortcut, not an authority.

The cache answers quickly, but the original data still belongs somewhere else. This is why caching always creates two questions:

How do we get data into the cache?
How do we keep cached data from becoming dangerously wrong?

If the data rarely changes, caching is relatively easy. A short URL mapping from short_code to destination may be immutable after creation, so stale reads are not a major concern.

If the data changes often, caching becomes harder. A user's profile, a product inventory count, or a post's like count can change while old values are still cached.

The more a cache improves performance, the more carefully you must reason about freshness.

Cache Hit And Cache Miss

A cache hit means the requested value is already in the cache.

short_code:abc123 -> https://example.com/article

A cache miss means the value is not present, so the service must go to the backing store.

cache miss
  -> query database
  -> return value
  -> store value in cache for next time

The hit ratio is the percentage of requests served from the cache. A high hit ratio usually means the cache is protecting the database well. A low hit ratio may mean the cache is too small, entries expire too quickly, keys are poorly chosen, or traffic is too random to benefit.

Time To Live

Time to live, or TTL, controls how long a cache entry can remain valid before it expires.

cache key: short_code:abc123
value: https://example.com/article
ttl: 24 hours

TTL gives the system a simple freshness policy: use the cached value for a while, then refresh it.

Long TTLs improve cache efficiency but increase the risk of stale data. Short TTLs reduce staleness but increase database traffic because entries expire more often.

There is no universal correct TTL. The correct value depends on the product promise. A redirect destination may tolerate a long TTL. A stock count, payment state, or user permission may need a very short TTL or no cache at all.

Cache Invalidation

Invalidation means removing or updating cached data when the underlying data changes.

For example:

user updates profile name
  -> write database
  -> delete cache key user:42
  -> next read reloads fresh data

Invalidation is difficult because writes and cache updates can fail independently. The database write might succeed, but the cache delete might fail. Or two requests might race: one request reads old data and fills the cache while another request writes new data.

This is why many systems combine invalidation with TTL. Even if invalidation fails, the stale value eventually expires.

Cache Stampede

A cache stampede happens when many requests miss the cache at the same time and all rush to the backing store.

Example:

1. Viral key expires from cache.
2. 20,000 requests arrive for the same key.
3. All see a cache miss.
4. All query the database.
5. The database overloads.

The cache was supposed to protect the database, but synchronized expiration turned it into a burst amplifier.

Common mitigations include:

request coalescing so only one request rebuilds a missing value
adding jitter to TTLs so many keys do not expire at once
serving slightly stale values while refreshing in the background
prewarming known hot keys

What Caching Does Not Solve

Caching does not remove the need for good data modeling, indexing, capacity planning, or backpressure.

It can hide pressure for a while, but if the cache becomes cold, invalid, overloaded, or unavailable, the backing store may suddenly receive traffic it has not been sized to handle.

Caching also does not make incorrect data correct. If the cached value is wrong, fast reads only return the wrong answer faster.

Operational Reality

Production caching is mostly about watching pressure and freshness.

Important signals:

cache hit ratio
cache miss rate
p95 and p99 cache latency
evictions
memory usage
hot keys
backend database traffic during cache misses
stale data incidents
stampede events

The central tradeoff is simple: the cache buys speed and protection, but it introduces freshness, invalidation, and operational complexity.

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Prerequisites

Read these first if this topic feels unfamiliar.

Database Indexing

Used In Systems

System studies where this idea appears in context.

URL Shortener System Instagram Likes System

Related Concepts

Core ideas that connect to this topic.

Hot Key Mitigation Eventual Consistency

Related Patterns

Reusable architecture moves built from these ideas.

Cache-Aside Pattern