Visual explainers
See the moving part before memorizing the term.
Each short walkthrough turns one distributed-systems or AI concept into a visible mechanism, so the tradeoff has somewhere to land.
A concrete engineering pattern, stripped down to the decision that matters. Start with The Forward Pass: Tokens to Next Token AI Engineering · ~95sLearning paths
Start with a sequence
Curated paths through related explainers, ordered by dependency.
LLM internals
From tokens to logits, cache, and decoding
- 1 Tokenization & Embeddings
- 2 Vector Space: Meaning as Direction
- 3 The Forward Pass: Tokens to Next Token
- 4 Inside One Transformer Block
- 5 Multi-Head Attention: Many Patterns at Once
- + 3 more in this path
Retrieval to agents
How answers become grounded, diagnosed, and looped
- 1 Dense + Sparse + Reciprocal Rank Fusion
- 2 Bi-Encoder Recall, Cross-Encoder Rerank
- 3 The RAG Debugging Decision Tree
- 4 The ReAct Loop and Its 3 Stop Conditions
Harness runtime
The control plane around a long-running agent
- 1 Skills: Index in Context, Body on Demand
- 2 Prompt Cache: Stable Prefix vs Volatile Tail
- 3 The Reasoning Sandwich
- 4 Replay Safety: 3 Tool Classes on Resume
- 5 Generator -> Reflector -> Curator
Distributed reliability
Coordination under partitions, crashes, and load
- 1 CAP Theorem
- 2 Majority Quorum
- 3 Raft Consensus Algorithm
- 4 Two-Phase Commit (2PC)
- 5 Eventual Consistency
- + 2 more in this path
Kafka mechanics
Ordering, durability, rebalances, and log compaction
- 1 Kafka Topic Partitioning
- 2 Producer Acknowledgments
- 3 Consumer Group Rebalancing
- 4 Exactly-Once Semantics
- 5 Log Compaction
All explainers
Browse by lane
Every explainer grouped by the kind of system pressure it clarifies.
AI and LLM Systems
Model internals, inference speed, retrieval quality, and agent control loops.
The Forward Pass: Tokens to Next Token
Walk through the inference path from token ids to logits before sampling chooses the next token.
Inside One Transformer Block
Open one transformer block and see residual stream, attention, normalization, feed-forward layers, and repetition.
Multi-Head Attention: Many Patterns at Once
See how multiple attention heads can track different relationships before their outputs are concatenated and projected.
Positional Encoding: Order via Rotation
See why attention needs position information and how RoPE turns token position into relative rotation.
Vector Space: Meaning as Direction
See embeddings as points, neighborhoods, directions, analogy-style offsets, and retrieval by cosine similarity.
Mixture of Experts: Router Picks Two
See sparse expert routing: a token is scored, top experts activate, and their weighted outputs are combined.
KV-Cache: Why the 2nd Token Is Faster
See how autoregressive decoding reuses previous keys and values instead of recomputing the whole prefix every token.
Speculative Decoding: Draft, Then Verify
See how a fast draft model proposes several tokens and the target model verifies them in parallel.
The Token Sampling Pipeline
Follow one next-token decision from raw logits through temperature, top-k, top-p, softmax, and the final sample.
Dense + Sparse + Reciprocal Rank Fusion
Watch dense retrieval and BM25 produce two ranked lists, then merge them with RRF instead of hand-tuning scores.
Bi-Encoder Recall, Cross-Encoder Rerank
See why production retrieval often uses a fast recall stage followed by a slower, more accurate reranker.
The RAG Debugging Decision Tree
Diagnose wrong RAG answers by asking one question before tuning prompts or rebuilding indexes.
The ReAct Loop and Its 3 Stop Conditions
See the Thought, Action, Observation cycle and the three exits that keep an agent from running forever.
Harness Engineering
The runtime patterns that keep long agent sessions cheap, resumable, and inspectable.
Skills: Index in Context, Body on Demand
See why skill descriptions stay in context while full bodies load only when the model selects one.
Prompt Cache: Stable Prefix vs Volatile Tail
See why prompt-cache stability is a source-level boundary, not a billing toggle.
The Reasoning Sandwich
See why max reasoning belongs at plan and verify, while implementation often works better at a mid tier.
Replay Safety: 3 Tool Classes on Resume
See how a crashed workflow routes pure, idempotent, and unsafe tools differently when it resumes.
Generator -> Reflector -> Curator
See session memory as a feedback loop: work produces traces, reflection extracts candidates, curation promotes durable memory.
Distributed Systems
Coordination, time, quorum, failure detection, and consistency under partitions.
CAP Theorem
Why distributed systems can't have Consistency, Availability, and Partition Tolerance all at once
Majority Quorum
Understanding R+W>N — how distributed systems guarantee reads see writes using quorum overlap
Lamport Clock
Understanding logical time — how distributed systems order events without synchronized clocks
Raft Consensus Algorithm
How distributed systems elect leaders and achieve agreement across multiple nodes
Two-Phase Commit (2PC)
How distributed systems achieve atomic transactions across multiple databases using the 2PC protocol
Eventual Consistency
How distributed databases synchronize without blocking writes
Gossip Protocol
How distributed systems spread information efficiently using epidemic-style protocols
Consistent Hashing
How distributed systems minimize data movement when nodes change using a hash ring
Heartbeat & Failure Detection
How distributed systems detect node failures using periodic heartbeat signals and timeouts
Streaming and Kafka
Partition routing, delivery guarantees, rebalances, and log-shaped storage.
Kafka Topic Partitioning
How Kafka distributes messages across partitions for parallelism and ordering
Producer Acknowledgments
Understanding Kafka's acks=0, acks=1, and acks=all for durability vs latency trade-offs
Consumer Group Rebalancing
How Kafka redistributes partitions among consumers when members join, leave, or fail
Exactly-Once Semantics
How Kafka achieves exactly-once processing using idempotent producers, transactions, and consumer isolation
Log Compaction
How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key
ML Fundamentals
The older foundations behind the newer LLM-specific walkthroughs.
Tokenization & Embeddings
How text becomes vectors — the pipeline from raw characters to dense numerical representations
Attention Mechanism
How self-attention enables transformers to understand context by letting each token attend to all others
Backpropagation
How neural networks learn by propagating errors backward through layers
System Design
Operational guardrails that protect services under uneven demand and partial failure.
Rate Limiting Algorithms
Token Bucket vs Sliding Window — understand burst handling, accuracy trade-offs, and when to use each
Circuit Breaker
How to prevent cascade failures in microservices using the circuit breaker pattern
Databases
Durability and recovery mechanics that shape larger systems.