AI Engineering
AI Concepts
Core AI engineering mechanisms behind models, embeddings, retrieval, agents, inference, and evaluation.
intermediate
Vector Embeddings
Turn text, images, code, users, or products into learned vectors that preserve useful relationships for search, ranking, retrieval, and modeling.
intermediate
Semantic Space
Reason about the learned representation space where embeddings are compared, clustered, ranked, and searched by relative position.
intermediate
Vector Search
Retrieve nearby vectors for a query representation so AI systems can find semantically related candidates under latency and scale pressure.
intermediate
Vector Databases
Store, index, filter, and retrieve vector embeddings as a production data service for semantic search and AI retrieval workloads.
intermediate
ANN Indexes
Trade exact nearest-neighbor scans for approximate vector indexes that keep similarity retrieval fast enough at larger corpus sizes.
intermediate
Indexing Techniques For Vector Search
Compare exact, partitioned, graph-based, and compressed vector index techniques by the retrieval work they save and the tradeoffs they introduce.
intermediate
Search Execution Flow
Follow a vector retrieval request from query embedding through filters, ANN candidates, payload hydration, reranking, and downstream context use.
intermediate
Supervised vs Unsupervised vs Self-Supervised Learning
Compare the learning signals behind supervised, unsupervised, and self-supervised training before moving into modern model objectives.
intermediate
Loss, Optimization, And Gradient Descent
Connect the training objective, loss signal, parameter updates, and gradient descent loop that make model learning concrete.
intermediate
Transformer Architecture
See the transformer as the model shape that turns token representations, attention, feed-forward layers, and repeated blocks into modern language-model computation.
intermediate
Attention
Understand attention as the mechanism that lets token positions choose which context signals matter when their representations are updated.
intermediate
Multi-Head Attention
Learn why transformers run several attention heads in parallel so token representations can mix different learned context signals.
intermediate
Masked Attention
Understand how attention masks control which token positions are allowed to influence each other, especially during next-token generation.
intermediate
Positional Embeddings
Learn why transformers need position information so token order can influence attention and language-model behavior.
intermediate
KV Cache
Understand how key-value caching makes autoregressive LLM inference faster by reusing attention work from previous tokens.
intermediate
Quantization
Learn how quantization reduces model memory and serving cost by representing weights or activations with lower precision.
intermediate
Distillation
Understand how knowledge distillation trains a smaller or cheaper model to imitate useful behavior from a larger teacher model.
intermediate
Mixture Of Experts
Learn how mixture-of-experts models increase capacity by routing inputs through selected expert subnetworks instead of activating every parameter.