Visual Explainers

Visual explainers

See the moving part before memorizing the term.

Each short walkthrough turns one distributed-systems or AI concept into a visible mechanism, so the tradeoff has somewhere to land.

A concrete engineering pattern, stripped down to the decision that matters. Start with The Forward Pass: Tokens to Next Token AI Engineering · ~95s

Watch The Forward Pass: Tokens to Next Token Browse all explainers

Learning paths

Start with a sequence

Curated paths through related explainers, ordered by dependency.

5 paths

Path 1 · 8 stops

LLM internals

From tokens to logits, cache, and decoding

1 Tokenization & Embeddings
2 Vector Space: Meaning as Direction
3 The Forward Pass: Tokens to Next Token
4 Inside One Transformer Block
5 Multi-Head Attention: Many Patterns at Once
+ 3 more in this path

Path 2 · 4 stops

Retrieval to agents

How answers become grounded, diagnosed, and looped

1 Dense + Sparse + Reciprocal Rank Fusion
2 Bi-Encoder Recall, Cross-Encoder Rerank
3 The RAG Debugging Decision Tree
4 The ReAct Loop and Its 3 Stop Conditions

Path 3 · 5 stops

Harness runtime

The control plane around a long-running agent

1 Skills: Index in Context, Body on Demand
2 Prompt Cache: Stable Prefix vs Volatile Tail
3 The Reasoning Sandwich
4 Replay Safety: 3 Tool Classes on Resume
5 Generator -> Reflector -> Curator

Path 4 · 7 stops

Distributed reliability

Coordination under partitions, crashes, and load

1 CAP Theorem
2 Majority Quorum
3 Raft Consensus Algorithm
4 Two-Phase Commit (2PC)
5 Eventual Consistency
+ 2 more in this path

Path 5 · 5 stops

Kafka mechanics

Ordering, durability, rebalances, and log compaction

1 Kafka Topic Partitioning
2 Producer Acknowledgments
3 Consumer Group Rebalancing
4 Exactly-Once Semantics
5 Log Compaction

All explainers

Browse by lane

Every explainer grouped by the kind of system pressure it clarifies.

38 explainers

AI and LLM Systems

Model internals, inference speed, retrieval quality, and agent control loops.

~95s AI Engineering

The Forward Pass: Tokens to Next Token

Walk through the inference path from token ids to logits before sampling chooses the next token.

What Is A ModelActivation Functions

~90s AI Engineering

Inside One Transformer Block

Open one transformer block and see residual stream, attention, normalization, feed-forward layers, and repetition.

Activation FunctionsNormalization

~95s AI Engineering

Multi-Head Attention: Many Patterns at Once

See how multiple attention heads can track different relationships before their outputs are concatenated and projected.

Activation FunctionsNormalization

~95s AI Engineering

Positional Encoding: Order via Rotation

See why attention needs position information and how RoPE turns token position into relative rotation.

NormalizationMath Intuitions

~95s AI Engineering

Vector Space: Meaning as Direction

See embeddings as points, neighborhoods, directions, analogy-style offsets, and retrieval by cosine similarity.

What Is A ModelMemory Systems

~95s AI Engineering

Mixture of Experts: Router Picks Two

See sparse expert routing: a token is scored, top experts activate, and their weighted outputs are combined.

What Is A ModelOptimization

~95s AI Engineering

KV-Cache: Why the 2nd Token Is Faster

See how autoregressive decoding reuses previous keys and values instead of recomputing the whole prefix every token.

Memory ComputeOptimization

~95s AI Engineering

Speculative Decoding: Draft, Then Verify

See how a fast draft model proposes several tokens and the target model verifies them in parallel.

OptimizationProbability Basics

~95s AI Engineering

The Token Sampling Pipeline

Follow one next-token decision from raw logits through temperature, top-k, top-p, softmax, and the final sample.

What Is A ModelProbability Basics

~95s AI Engineering

Dense + Sparse + Reciprocal Rank Fusion

Watch dense retrieval and BM25 produce two ranked lists, then merge them with RRF instead of hand-tuning scores.

MetricsProbability Basics

~95s AI Engineering

Bi-Encoder Recall, Cross-Encoder Rerank

See why production retrieval often uses a fast recall stage followed by a slower, more accurate reranker.

MetricsOptimization

~95s AI Engineering

The RAG Debugging Decision Tree

Diagnose wrong RAG answers by asking one question before tuning prompts or rebuilding indexes.

MetricsProbability Basics

~100s AI Engineering

The ReAct Loop and Its 3 Stop Conditions

See the Thought, Action, Observation cycle and the three exits that keep an agent from running forever.

What Is A ModelFailure Detection

Harness Engineering

The runtime patterns that keep long agent sessions cheap, resumable, and inspectable.

~100s Harness Engineering

Skills: Index in Context, Body on Demand

See why skill descriptions stay in context while full bodies load only when the model selects one.

Memory SystemsMetrics

~100s Harness Engineering

Prompt Cache: Stable Prefix vs Volatile Tail

See why prompt-cache stability is a source-level boundary, not a billing toggle.

Memory SystemsOptimization

~100s Harness Engineering

The Reasoning Sandwich

See why max reasoning belongs at plan and verify, while implementation often works better at a mid tier.

OptimizationMetrics

~100s Harness Engineering

Replay Safety: 3 Tool Classes on Resume

See how a crashed workflow routes pure, idempotent, and unsafe tools differently when it resumes.

IdempotenceCheckpointing

~100s Harness Engineering

Generator -> Reflector -> Curator

See session memory as a feedback loop: work produces traces, reflection extracts candidates, curation promotes durable memory.

Memory SystemsMetrics

Distributed Systems

Coordination, time, quorum, failure detection, and consistency under partitions.

~100s Distributed Systems

CAP Theorem

Why distributed systems can't have Consistency, Availability, and Partition Tolerance all at once

Cap TheoremEventual Consistency

~100s Distributed Systems

Majority Quorum

Understanding R+W>N — how distributed systems guarantee reads see writes using quorum overlap

QuorumConsensus

~100s Distributed Systems

Lamport Clock

Understanding logical time — how distributed systems order events without synchronized clocks

Lamport ClockLogical Clocks

~90s Distributed Systems

Raft Consensus Algorithm

How distributed systems elect leaders and achieve agreement across multiple nodes

ConsensusLeader Follower Replication

~90s Distributed Systems

Two-Phase Commit (2PC)

How distributed systems achieve atomic transactions across multiple databases using the 2PC protocol

Acid TransactionsDistributed Systems

~80s Distributed Systems

Eventual Consistency

How distributed databases synchronize without blocking writes

Eventual ConsistencyReplication

~100s Distributed Systems

Gossip Protocol

How distributed systems spread information efficiently using epidemic-style protocols

Gossip ProtocolEventual Consistency

~95s Distributed Systems

Consistent Hashing

How distributed systems minimize data movement when nodes change using a hash ring

ShardingLoad Balancing

~100s Distributed Systems

Heartbeat & Failure Detection

How distributed systems detect node failures using periodic heartbeat signals and timeouts

HeartbeatFailure Detection

Streaming and Kafka

Partition routing, delivery guarantees, rebalances, and log-shaped storage.

~100s Streaming

Kafka Topic Partitioning

How Kafka distributes messages across partitions for parallelism and ordering

Topic PartitioningConsumer Groups

~75s Streaming

Producer Acknowledgments

Understanding Kafka's acks=0, acks=1, and acks=all for durability vs latency trade-offs

Producer AcknowledgmentsReplication

~100s Streaming

Consumer Group Rebalancing

How Kafka redistributes partitions among consumers when members join, leave, or fail

Consumer GroupsTopic Partitioning

~110s Streaming

Exactly-Once Semantics

How Kafka achieves exactly-once processing using idempotent producers, transactions, and consumer isolation

Exactly Once SemanticsIdempotence

~85s Streaming

Log Compaction

How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key

Log Based StorageImmutability

ML Fundamentals

The older foundations behind the newer LLM-specific walkthroughs.

~90s ML Fundamentals

Tokenization & Embeddings

How text becomes vectors — the pipeline from raw characters to dense numerical representations

What Is A ModelProbability Basics

~110s ML Fundamentals

Attention Mechanism

How self-attention enables transformers to understand context by letting each token attend to all others

Activation FunctionsNormalization

~120s ML Fundamentals

Backpropagation

How neural networks learn by propagating errors backward through layers

BackpropagationLoss Functions

System Design

Operational guardrails that protect services under uneven demand and partial failure.

~100s System Design

Rate Limiting Algorithms

Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each

Rate LimitingToken Bucket

~100s System Design

Circuit Breaker

How to prevent cascade failures in microservices using the circuit breaker pattern

Circuit BreakerFailover

Databases

Durability and recovery mechanics that shape larger systems.

~85s Databases

Write-Ahead Log (WAL)

How databases ensure durability and crash recovery using write-ahead logging

Write Ahead LogCheckpointing