AI Concepts
Indexing Techniques For Vector Search
Compare exact, partitioned, graph-based, and compressed vector index techniques by the retrieval work they save and the tradeoffs they introduce.
After this, you will understand
How Indexing Techniques For Vector Search helps you see what mechanism is doing the work, what tradeoff it introduces, and where it appears in AI systems.
Start with the word in plain English before adding machinery.
The idea becomes unclear when it is mixed with Flat Search, IVF, and HNSW too early.
Connect the word to inputs, outputs, model behavior, product boundaries, and evaluation.
Think before readingBefore learning the mechanics, what should a beginner understand about Flat Search and IVF?
Reading in progress
This page is saved in your local study history so you can continue later.
Study path
Read these in order
Start with the mechanics, then move into the patterns that explain why the system is shaped this way.
Concepts Covered
- Exact vector scans
- ANN indexing families
- Partitioned indexes
- Graph indexes
- IVF
- HNSW
- Quantization
- Product quantization
- Index selection tradeoffs
Definition
Vector-search indexing techniques are the structures and compression choices that decide how a search system narrows work before it returns nearby vector candidates.
They answer a practical question:
How do we avoid comparing a live query against every full vector every time?
Different techniques save different kinds of work. That is why vector stores expose several index families instead of one universal index.
Start With The Baseline
The baseline is flat search.
query vector -> compare with every candidate vector -> top-k
Flat or brute-force search is useful:
- as a correctness baseline
- for small datasets
- for small filtered subsets
- when exactness matters more than scale savings
Indexing only makes sense when you can name the pressure it relieves.
Partitioned Indexes
Partitioned techniques group vectors into regions or buckets.
An inverted-file style index, often called IVF in vector-search tooling, uses a coarse partitioning step so search can probe selected regions instead of scanning the entire collection.
The tradeoff is visible:
- probe too few partitions and recall may fall
- probe more partitions and latency rises
Partitioned search is easier to understand once you stop thinking "the index finds the answer" and start thinking "the index chooses which neighborhoods deserve expensive comparison."
Graph Indexes
Graph-based techniques link vectors to nearby vectors and navigate those links at query time.
HNSW is the most recognizable concept in this family. Its graph hierarchy gives the search path ways to move through broad neighborhoods and then refine locally.
Graph indexes can produce strong recall-latency tradeoffs for many workloads. They can also be memory-heavy, and their build/update characteristics matter for write-heavy or rapidly changing collections.
Compression And Quantization
Another pressure is memory and distance-computation cost.
Quantization stores lower-cost approximations of vector values or vector subspaces so search can operate more cheaply.
Product quantization, often shortened to PQ, is a well-known compression direction. The vector is represented in compressed form so many comparisons use less memory and bandwidth than full-precision vectors.
Compression can change the quality curve. Some systems retrieve a broader approximate candidate set and refine distances on a smaller subset with higher-precision data.
Technique Combinations
Index families are not always isolated.
A search stack may combine:
- partitioning
- compression
- graph navigation
- reranking or refinement
That is why an index name alone is not the entire retrieval plan. The plan also includes candidate count, metric, filtering, reranking, hardware fit, update path, and evaluation.
How To Choose
Start with workload questions:
- How many vectors are searchable?
- How tight is live latency?
- How much memory can the service afford?
- How often do inserts, updates, deletes, or re-embedding jobs happen?
- How strict are filters and tenant boundaries?
- What recall target matters for the product task?
A low-latency recommendation candidate stage and a small compliance corpus do not need the same operating point.
Failure Modes
Index selection goes wrong when teams:
- copy a benchmark winner without matching corpus and query shape
- compress away quality before measuring task recall
- ignore rebuild cost and freshness
- treat metadata filtering as an afterthought
- use approximate search where the filtered candidate set is already small
- skip a reranking stage when approximate candidates need stronger ordering
The right index is the one that serves the retrieval contract the product actually needs.
Related Topics
What to study next
These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.
Prerequisites
Read these first if the mechanics feel unfamiliar.
More Links
Additional references connected to this page.