Concepts

Database Indexing

Organize database data structures so common reads can find the right rows quickly without scanning the entire table.

foundation4 min readUpdated 2026-05-14DataCapacityOperationsTradeoffs

B-TreesQuery PlanningRead AmplificationWrite AmplificationSelectivity

After this, you will understand

How Database Indexing helps you see where this idea appears in production systems, what problem forces it, and how to reason about the tradeoffs.

Naive mental model

Treat the idea as a definition to memorize.

Production pressure

Real systems force the idea to handle B-Trees, Query Planning, and Read Amplification.

Better reasoning

Use the concept to decide what the system guarantees, what it risks, and what it costs to operate.

Think before readingWhere would Database Indexing appear in a real production system, and what failure or bottleneck would it help you reason about?

As you read, look for the pressure that creates the idea first. The mechanics matter more once the reason is clear.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: Delivery Guarantees

Concepts Covered

Table scans
Index lookups
B-tree indexes
Selectivity
Composite indexes
Query planning
Read and write tradeoffs
Operational signals

Definition

A database index is a data structure that helps the database find rows without scanning the entire table.

Without an index, the database may need to inspect many rows to answer a query. With a useful index, the database can jump closer to the rows it needs.

For example, a URL shortener redirect query usually looks like this:

SELECT destination_url
FROM short_links
WHERE short_code = 'abc123';

If short_code is indexed, the database can find the matching row quickly. If it is not indexed, the database may scan the table looking for the one row with that code.

The Pain That Forces Indexing

A table scan is acceptable when a table is tiny. It becomes dangerous when the table grows and the query sits on a hot path.

Imagine a redirect table with 500 million rows. Every redirect request needs one row by short_code.

Without an index:

request arrives
  -> database checks many rows
  -> CPU and disk work increase
  -> latency rises
  -> connections stay open longer
  -> more requests pile up

The application code did not change. The query still looks innocent. But the amount of work behind the query grew with the table.

Indexing exists because production systems cannot afford to rediscover the location of common data by scanning everything.

Mental Model

An index is like a sorted lookup structure maintained beside the table.

The table stores the full row:

id | short_code | destination_url | created_at | user_id

The index stores a smaller structure optimized for search:

short_code -> row location

When a query filters by short_code, the database uses the index to locate the row and then fetches the needed data.

Many relational databases use B-tree indexes for general-purpose lookups because B-trees keep keys ordered and make equality and range queries efficient.

Selectivity

Selectivity describes how much an index narrows the result set.

An index on short_code is highly selective because one short code should usually point to one row.

An index on is_active may be less useful if almost every row has is_active = true. The database still has to read a large part of the table, so the index may not help much.

Useful indexes usually match:

common query filters
joins
sorting requirements
uniqueness constraints
high-selectivity fields

Indexes should come from access patterns, not guesswork.

Composite Indexes

A composite index contains multiple columns.

Example:

CREATE INDEX idx_user_created_at
ON short_links(user_id, created_at);

This helps queries like:

SELECT *
FROM short_links
WHERE user_id = 'user_42'
ORDER BY created_at DESC;

The order of fields matters. An index on (user_id, created_at) is not the same as an index on (created_at, user_id). The database can use the left side of a composite index most naturally, so the index should match how the query filters and sorts.

The Tradeoff

Indexes speed up reads, but they make writes more expensive.

Every insert, update, or delete may need to update one or more indexes. This creates write amplification:

insert row
  -> write table data
  -> update index on short_code
  -> update index on user_id
  -> update index on created_at

Too few indexes cause slow reads. Too many indexes slow writes, consume storage, and make migrations heavier.

Query Planning

The database query planner decides whether to use an index, which index to use, and in what order to execute the query.

Sometimes an index exists but the planner does not use it because:

the query is not selective enough
statistics are outdated
the query shape does not match the index
the database expects a scan to be cheaper
functions or transformations prevent index usage

This is why production engineers inspect query plans. The existence of an index is not the same as the database using it effectively.

Operational Reality

Important signals:

slow query logs
query plans
rows scanned versus rows returned
index hit ratio
write latency after adding indexes
index storage growth
lock time during index creation
p95 and p99 query latency

Indexes are one of the first scaling tools because they preserve the simple database model longer. But they are not infinite. Once data volume, write throughput, or hot partitions exceed what one database can handle, the system may need caching, read replicas, or sharding.

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Used In Systems

System studies where this idea appears in context.

URL Shortener SystemSee the idea under full production pressure.

Related Concepts

Core ideas that connect to this topic.

CachingUnderstand the concept behind the design decision.ShardingUnderstand the concept behind the design decision.

What to study next

Used In Systems

Related Concepts

More Links