Explainers

Short animated walkthroughs of system-design ideas — the kind that are easier to understand when you can see them move.

Visual explainers

See the moving part before memorizing the term.

Each short walkthrough turns one distributed-systems or AI concept into a visible mechanism, so the tradeoff has somewhere to land.

A concrete engineering pattern, stripped down to the decision that matters. Start with The Forward Pass: Tokens to Next Token AI Engineering · ~95s

Learning paths

Start with a sequence

Curated paths through related explainers, ordered by dependency.

5 paths

All explainers

Browse by lane

Every explainer grouped by the kind of system pressure it clarifies.

38 explainers

AI and LLM Systems

Model internals, inference speed, retrieval quality, and agent control loops.

13
~95s AI Engineering

The Forward Pass: Tokens to Next Token

Walk through the inference path from token ids to logits before sampling chooses the next token.

What Is A ModelActivation Functions
~90s AI Engineering

Inside One Transformer Block

Open one transformer block and see residual stream, attention, normalization, feed-forward layers, and repetition.

Activation FunctionsNormalization
~95s AI Engineering

Multi-Head Attention: Many Patterns at Once

See how multiple attention heads can track different relationships before their outputs are concatenated and projected.

Activation FunctionsNormalization
~95s AI Engineering

Positional Encoding: Order via Rotation

See why attention needs position information and how RoPE turns token position into relative rotation.

NormalizationMath Intuitions
~95s AI Engineering

Vector Space: Meaning as Direction

See embeddings as points, neighborhoods, directions, analogy-style offsets, and retrieval by cosine similarity.

What Is A ModelMemory Systems
~95s AI Engineering

Mixture of Experts: Router Picks Two

See sparse expert routing: a token is scored, top experts activate, and their weighted outputs are combined.

What Is A ModelOptimization
~95s AI Engineering

KV-Cache: Why the 2nd Token Is Faster

See how autoregressive decoding reuses previous keys and values instead of recomputing the whole prefix every token.

Memory ComputeOptimization
~95s AI Engineering

Speculative Decoding: Draft, Then Verify

See how a fast draft model proposes several tokens and the target model verifies them in parallel.

OptimizationProbability Basics
~95s AI Engineering

The Token Sampling Pipeline

Follow one next-token decision from raw logits through temperature, top-k, top-p, softmax, and the final sample.

What Is A ModelProbability Basics
~95s AI Engineering

Dense + Sparse + Reciprocal Rank Fusion

Watch dense retrieval and BM25 produce two ranked lists, then merge them with RRF instead of hand-tuning scores.

MetricsProbability Basics
~95s AI Engineering

Bi-Encoder Recall, Cross-Encoder Rerank

See why production retrieval often uses a fast recall stage followed by a slower, more accurate reranker.

MetricsOptimization
~95s AI Engineering

The RAG Debugging Decision Tree

Diagnose wrong RAG answers by asking one question before tuning prompts or rebuilding indexes.

MetricsProbability Basics
~100s AI Engineering

The ReAct Loop and Its 3 Stop Conditions

See the Thought, Action, Observation cycle and the three exits that keep an agent from running forever.

What Is A ModelFailure Detection

Harness Engineering

The runtime patterns that keep long agent sessions cheap, resumable, and inspectable.

5

Distributed Systems

Coordination, time, quorum, failure detection, and consistency under partitions.

9

Streaming and Kafka

Partition routing, delivery guarantees, rebalances, and log-shaped storage.

5

ML Fundamentals

The older foundations behind the newer LLM-specific walkthroughs.

3

System Design

Operational guardrails that protect services under uneven demand and partial failure.

2

Databases

Durability and recovery mechanics that shape larger systems.

1