Concepts

Geospatial Indexing

Partition physical space into searchable cells so nearby entities can be found without scanning every location record.

intermediate3 min readUpdated unknownDataCapacityOperationsTradeoffs
Spatial PartitioningProximity SearchCell-Based IndexesRadius QueriesHot Regions

Concepts Covered

  • Spatial partitioning
  • Cell-based indexes
  • Radius queries
  • Nearby-driver lookup
  • Precision tradeoffs
  • Hot geographic regions
  • Rebalancing under city-level demand

Definition

Geospatial indexing is the practice of organizing location data so the system can quickly find entities near a point on a map.

Instead of scanning every driver location to find nearby drivers, the system maps locations into spatial cells and searches only the relevant cells.

Example:

driver_7  -> cell: city_a/zone_12/block_44
driver_19 -> cell: city_a/zone_12/block_45
driver_88 -> cell: city_a/zone_17/block_02

A rider pickup request can then search the pickup cell and neighboring cells instead of scanning every online driver in the city.

The Pain That Forces Geospatial Indexing

A naive ride matching system stores driver locations in a table:

driver_id | lat | lng | status | updated_at

Then it tries to find nearby drivers with:

SELECT *
FROM driver_locations
WHERE status = 'available'
ORDER BY distance(lat, lng, pickup_lat, pickup_lng)
LIMIT 20;

This is understandable, but it becomes expensive quickly. Every ride request asks the database to compute distance against many drivers. During rush hour, concerts, airports, or bad weather, that query can become one of the hottest paths in the system.

Geospatial indexing exists because physical proximity queries must avoid global scans.

Mental Model

Think of the map as a grid.

Each location falls into a cell:

pickup point -> cell A
nearby search -> cell A + neighboring cells

The system trades perfect geometric precision for fast candidate narrowing. It first finds plausible nearby drivers by cell, then computes exact distances on that smaller candidate set.

This two-step model matters:

coarse spatial lookup -> exact distance/ranking

The index finds candidates. The matcher decides which candidate is best.

How It Works

Common approaches include:

  • fixed grids
  • geohashes
  • S2 cells
  • quadtrees
  • database-native spatial indexes

The exact structure varies, but the workload shape is stable:

1. Driver reports location.
2. System maps lat/lng to a spatial cell.
3. Driver availability index stores driver under that cell.
4. Rider requests pickup.
5. Matcher searches pickup cell and nearby cells.
6. Matcher ranks candidates by ETA, distance, status, and business rules.

Tradeoffs

Small cells improve precision but increase the number of neighboring cells the system may need to inspect.

Large cells reduce index overhead but return more irrelevant candidates.

Dense areas behave differently from sparse areas. A downtown cell may contain thousands of drivers. A rural cell may contain none. Production systems often adapt search radius, cell resolution, and fallback behavior by region and demand.

Operational Reality

Important signals:

  • candidate lookup latency
  • average candidate count per request
  • empty-cell rate
  • hot-cell write rate
  • stale driver location rate
  • fallback radius expansion rate
  • city or region imbalance

Knowledge links

Use these links to understand what to know first, where this idea appears, and what to study next.

Used In Systems

System studies where this idea appears in context.

Related Concepts

Core ideas that connect to this topic.