Skip to content

AI Engineering Fundamentals: From Tokens to Agents

I’ve been building AI systems for a while now, and one pattern keeps emerging: most tutorials either oversimplify or assume too much.

You get either “just call the API” or dense academic papers. The messy middle—where production systems actually live—is poorly documented.

So I wrote the series I wished existed when I started.

What This Series Covers

8 progressions, each building on the last:

PartTopicWhat You’ll Learn
0Text → TokensWhy tokenization breaks arithmetic, costs vary by language, and token boundaries affect generation
1Tokens → EmbeddingsWhy one-hot encoding fails, how meaning emerges from training, measuring similarity
2Embeddings → AttentionQ/K/V mechanics, multi-head attention, why “lost in the middle” happens
3Attention → GenerationTemperature, sampling strategies, why deterministic generation doesn’t exist
4Generation → RetrievalVector search, chunking strategies, dense vs sparse retrieval
5Retrieval → RAGPrompt construction, reranking, the debugging decision tree
6RAG → AgentsThe agent loop, tools, ReAct pattern, memory systems
7Agents → EvaluationTask completion, trajectory quality, safety metrics, production monitoring

Why This Structure?

Each progression follows a pattern:

  1. What the previous step enabled
  2. What problem it creates
  3. How this step solves it
  4. What can go wrong (production failure modes)
  5. How to verify understanding

No isolated concepts. Everything connects.

Who This Is For

You should read this if:

  • You’re building with LLMs but treating them as black boxes
  • You’ve done the tutorials but don’t understand why things work
  • You’re debugging RAG systems and don’t know where to look
  • You’re evaluating AI tools and need to ask the right questions

You probably don’t need this if:

  • You’re doing ML research (this is engineering, not theory)
  • You just need to call an API once (keep it simple)

The Debugging Payoff

Here’s why the mechanics matter. When something breaks:

Symptom: Non-English users report higher costs
Without understanding: "Must be a billing bug"
With understanding: Tokenizer trained on English  3-5x more tokens per concept

Symptom: RAG returns relevant docs but wrong answers
Without understanding: "The model is hallucinating"
With understanding: Check prompt construction, context ordering, or attention loss

Symptom: Agent loops forever
Without understanding: "Need better prompts"
With understanding: Tool descriptions unclear, observation parsing failing, or termination condition ambiguous

Understanding the stack turns mysterious failures into debuggable problems.

Start Here

If you’re new to AI engineering: Start from the beginning

If you’re already building RAG systems: Jump to RAG debugging

If you’re evaluating or building agents: Agent patterns


This is part of my “learning in public” approach. The series will evolve as I learn more. Feedback welcome.

→ Browse the full AI Engineering series