AI Engineering Fundamentals: From Tokens to Agents
I’ve been building AI systems for a while now, and one pattern keeps emerging: most tutorials either oversimplify or assume too much.
You get either “just call the API” or dense academic papers. The messy middle—where production systems actually live—is poorly documented.
So I wrote the series I wished existed when I started.
What This Series Covers
8 progressions, each building on the last:
| Part | Topic | What You’ll Learn |
|---|---|---|
| 0 | Text → Tokens | Why tokenization breaks arithmetic, costs vary by language, and token boundaries affect generation |
| 1 | Tokens → Embeddings | Why one-hot encoding fails, how meaning emerges from training, measuring similarity |
| 2 | Embeddings → Attention | Q/K/V mechanics, multi-head attention, why “lost in the middle” happens |
| 3 | Attention → Generation | Temperature, sampling strategies, why deterministic generation doesn’t exist |
| 4 | Generation → Retrieval | Vector search, chunking strategies, dense vs sparse retrieval |
| 5 | Retrieval → RAG | Prompt construction, reranking, the debugging decision tree |
| 6 | RAG → Agents | The agent loop, tools, ReAct pattern, memory systems |
| 7 | Agents → Evaluation | Task completion, trajectory quality, safety metrics, production monitoring |
Why This Structure?
Each progression follows a pattern:
- What the previous step enabled
- What problem it creates
- How this step solves it
- What can go wrong (production failure modes)
- How to verify understanding
No isolated concepts. Everything connects.
Who This Is For
You should read this if:
- You’re building with LLMs but treating them as black boxes
- You’ve done the tutorials but don’t understand why things work
- You’re debugging RAG systems and don’t know where to look
- You’re evaluating AI tools and need to ask the right questions
You probably don’t need this if:
- You’re doing ML research (this is engineering, not theory)
- You just need to call an API once (keep it simple)
The Debugging Payoff
Here’s why the mechanics matter. When something breaks:
Symptom: Non-English users report higher costs
Without understanding: "Must be a billing bug"
With understanding: Tokenizer trained on English → 3-5x more tokens per concept
Symptom: RAG returns relevant docs but wrong answers
Without understanding: "The model is hallucinating"
With understanding: Check prompt construction, context ordering, or attention loss
Symptom: Agent loops forever
Without understanding: "Need better prompts"
With understanding: Tool descriptions unclear, observation parsing failing, or termination condition ambiguous
Understanding the stack turns mysterious failures into debuggable problems.
Start Here
If you’re new to AI engineering: Start from the beginning
If you’re already building RAG systems: Jump to RAG debugging
If you’re evaluating or building agents: Agent patterns
This is part of my “learning in public” approach. The series will evolve as I learn more. Feedback welcome.