AI Concepts

Indexing Techniques For Vector Search

Compare exact, partitioned, graph-based, and compressed vector index techniques by the retrieval work they save and the tradeoffs they introduce.

intermediate3 min readUpdated 2026-05-22RetrievalMechanicsCapacityTradeoffs
Flat SearchIVFHNSWQuantizationProduct QuantizationIndex Selection

After this, you will understand

How Indexing Techniques For Vector Search helps you see what mechanism is doing the work, what tradeoff it introduces, and where it appears in AI systems.

Beginner version

Start with the word in plain English before adding machinery.

Confusion point

The idea becomes unclear when it is mixed with Flat Search, IVF, and HNSW too early.

Better mental model

Connect the word to inputs, outputs, model behavior, product boundaries, and evaluation.

Think before readingBefore learning the mechanics, what should a beginner understand about Flat Search and IVF?
As you read, separate the vocabulary from the implementation details. The word should feel clear before the system design gets complex.

Reading in progress

This page is saved in your local study history so you can continue later.

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

  1. 1Search Execution Flowai-concepts

Concepts Covered

  • Exact vector scans
  • ANN indexing families
  • Partitioned indexes
  • Graph indexes
  • IVF
  • HNSW
  • Quantization
  • Product quantization
  • Index selection tradeoffs

Definition

Vector-search indexing techniques are the structures and compression choices that decide how a search system narrows work before it returns nearby vector candidates.

They answer a practical question:

How do we avoid comparing a live query against every full vector every time?

Different techniques save different kinds of work. That is why vector stores expose several index families instead of one universal index.

Start With The Baseline

The baseline is flat search.

query vector -> compare with every candidate vector -> top-k

Flat or brute-force search is useful:

  • as a correctness baseline
  • for small datasets
  • for small filtered subsets
  • when exactness matters more than scale savings

Indexing only makes sense when you can name the pressure it relieves.

Partitioned Indexes

Partitioned techniques group vectors into regions or buckets.

An inverted-file style index, often called IVF in vector-search tooling, uses a coarse partitioning step so search can probe selected regions instead of scanning the entire collection.

The tradeoff is visible:

  • probe too few partitions and recall may fall
  • probe more partitions and latency rises

Partitioned search is easier to understand once you stop thinking "the index finds the answer" and start thinking "the index chooses which neighborhoods deserve expensive comparison."

Graph Indexes

Graph-based techniques link vectors to nearby vectors and navigate those links at query time.

HNSW is the most recognizable concept in this family. Its graph hierarchy gives the search path ways to move through broad neighborhoods and then refine locally.

Graph indexes can produce strong recall-latency tradeoffs for many workloads. They can also be memory-heavy, and their build/update characteristics matter for write-heavy or rapidly changing collections.

Compression And Quantization

Another pressure is memory and distance-computation cost.

Quantization stores lower-cost approximations of vector values or vector subspaces so search can operate more cheaply.

Product quantization, often shortened to PQ, is a well-known compression direction. The vector is represented in compressed form so many comparisons use less memory and bandwidth than full-precision vectors.

Compression can change the quality curve. Some systems retrieve a broader approximate candidate set and refine distances on a smaller subset with higher-precision data.

Technique Combinations

Index families are not always isolated.

A search stack may combine:

  • partitioning
  • compression
  • graph navigation
  • reranking or refinement

That is why an index name alone is not the entire retrieval plan. The plan also includes candidate count, metric, filtering, reranking, hardware fit, update path, and evaluation.

How To Choose

Start with workload questions:

  1. How many vectors are searchable?
  2. How tight is live latency?
  3. How much memory can the service afford?
  4. How often do inserts, updates, deletes, or re-embedding jobs happen?
  5. How strict are filters and tenant boundaries?
  6. What recall target matters for the product task?

A low-latency recommendation candidate stage and a small compliance corpus do not need the same operating point.

Failure Modes

Index selection goes wrong when teams:

  • copy a benchmark winner without matching corpus and query shape
  • compress away quality before measuring task recall
  • ignore rebuild cost and freshness
  • treat metadata filtering as an afterthought
  • use approximate search where the filtered candidate set is already small
  • skip a reranking stage when approximate candidates need stronger ordering

The right index is the one that serves the retrieval contract the product actually needs.

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Prerequisites

Read these first if the mechanics feel unfamiliar.