Concepts
Database Indexing
Organize database data structures so common reads can find the right rows quickly without scanning the entire table.
Concepts Covered
- Table scans
- Index lookups
- B-tree indexes
- Selectivity
- Composite indexes
- Query planning
- Read and write tradeoffs
- Operational signals
Definition
A database index is a data structure that helps the database find rows without scanning the entire table.
Without an index, the database may need to inspect many rows to answer a query. With a useful index, the database can jump closer to the rows it needs.
For example, a URL shortener redirect query usually looks like this:
SELECT destination_url
FROM short_links
WHERE short_code = 'abc123';
If short_code is indexed, the database can find the matching row quickly. If it is not indexed, the database may scan the table looking for the one row with that code.
The Pain That Forces Indexing
A table scan is acceptable when a table is tiny. It becomes dangerous when the table grows and the query sits on a hot path.
Imagine a redirect table with 500 million rows. Every redirect request needs one row by short_code.
Without an index:
request arrives
-> database checks many rows
-> CPU and disk work increase
-> latency rises
-> connections stay open longer
-> more requests pile up
The application code did not change. The query still looks innocent. But the amount of work behind the query grew with the table.
Indexing exists because production systems cannot afford to rediscover the location of common data by scanning everything.
Mental Model
An index is like a sorted lookup structure maintained beside the table.
The table stores the full row:
id | short_code | destination_url | created_at | user_id
The index stores a smaller structure optimized for search:
short_code -> row location
When a query filters by short_code, the database uses the index to locate the row and then fetches the needed data.
Many relational databases use B-tree indexes for general-purpose lookups because B-trees keep keys ordered and make equality and range queries efficient.
Selectivity
Selectivity describes how much an index narrows the result set.
An index on short_code is highly selective because one short code should usually point to one row.
An index on is_active may be less useful if almost every row has is_active = true. The database still has to read a large part of the table, so the index may not help much.
Useful indexes usually match:
- common query filters
- joins
- sorting requirements
- uniqueness constraints
- high-selectivity fields
Indexes should come from access patterns, not guesswork.
Composite Indexes
A composite index contains multiple columns.
Example:
CREATE INDEX idx_user_created_at
ON short_links(user_id, created_at);
This helps queries like:
SELECT *
FROM short_links
WHERE user_id = 'user_42'
ORDER BY created_at DESC;
The order of fields matters. An index on (user_id, created_at) is not the same as an index on (created_at, user_id). The database can use the left side of a composite index most naturally, so the index should match how the query filters and sorts.
The Tradeoff
Indexes speed up reads, but they make writes more expensive.
Every insert, update, or delete may need to update one or more indexes. This creates write amplification:
insert row
-> write table data
-> update index on short_code
-> update index on user_id
-> update index on created_at
Too few indexes cause slow reads. Too many indexes slow writes, consume storage, and make migrations heavier.
Query Planning
The database query planner decides whether to use an index, which index to use, and in what order to execute the query.
Sometimes an index exists but the planner does not use it because:
- the query is not selective enough
- statistics are outdated
- the query shape does not match the index
- the database expects a scan to be cheaper
- functions or transformations prevent index usage
This is why production engineers inspect query plans. The existence of an index is not the same as the database using it effectively.
Operational Reality
Important signals:
- slow query logs
- query plans
- rows scanned versus rows returned
- index hit ratio
- write latency after adding indexes
- index storage growth
- lock time during index creation
- p95 and p99 query latency
Indexes are one of the first scaling tools because they preserve the simple database model longer. But they are not infinite. Once data volume, write throughput, or hot partitions exceed what one database can handle, the system may need caching, read replicas, or sharding.
Related Topics
Knowledge links
Use these links to understand what to know first, where this idea appears, and what to study next.
Used In Systems
System studies where this idea appears in context.
More Links
Additional references connected to this page.