AI Engineering 101:
The Definitive Primer

Chip Huyen

Author "AI Engineering"; Stanford, Nvidia, Netflix alum

OCT 23 2025

The Discipline

AI Engineering Is a
New Discipline, Not Just ML

"AI engineering is not machine learning. It's the discipline of building reliable, scalable, evaluatable AI systems in production."

ML research: finding new algorithms. AI engineering: making algorithms work reliably.
The stack: prompting → RAG → fine-tuning → training (most products never need the last 2)
Evals are the engineering discipline of AI — measure before you optimize
Most AI product failures are engineering failures, not model failures

Framework

The AI Engineering Stack

80%

of AI products need only prompting + RAG

10×

improvement from good evals vs better models

3×

faster iteration with eval-first approach

Layer 1: Prompt engineering — most product work lives here
Layer 2: RAG (Retrieval-Augmented Generation) — inject relevant context
Layer 3: Fine-tuning — only when you have thousands of examples and clear improvement
Layer 4: Training — almost never needed for product teams

Chip's ruleIf you're fine-tuning before you've exhausted prompt engineering and RAG, you're solving the wrong problem.

Evals: The Critical Discipline

Measure What You Build

Evals = tests for AI systems: Unit tests for model behavior
What to evaluate: Correctness, safety, latency, cost, user satisfaction
How to build evals: Start with 20-50 examples your product gets wrong or right
The eval loop: Run evals before each model upgrade or prompt change

The eval trap

Most teams skip evals until things go wrong in production. By then, you have no baseline to fix from.

The human eval

Automated evals scale; human evals set the ground truth. You need both. Human evals define what good looks like.

Playbook

Ship Better AI Products

Build your eval framework before your 3rd AI feature
Start with 20 golden examples: 10 that should work, 10 edge cases
Use LLMs to help write evals — it's the AI engineering meta-loop
Publish your eval results internally — it creates accountability and surfaces regressions

The book insightChip's "AI Engineering" book is the first that bridges the gap between ML research and product-level AI engineering. Required reading for AI PMs.

Contrarian

AI Engineering Myths

✗Better model = better productINSTEAD →✓ Better evals = better product. Model quality matters; eval quality determines if you can tell the difference.

✗Prompt engineering is not real engineeringINSTEAD →✓ Prompt engineering is the highest-leverage skill in AI product development. Dismiss it and get outperformed.

✗Fine-tune on your data earlyINSTEAD →✓ Fine-tune after you've fully exploited prompt + RAG. Early fine-tuning is premature optimization.

✗AI products don't need testsINSTEAD →✓ AI products need MORE tests than traditional software because the failure modes are probabilistic.

0:00

AI Engineering Is aNew Discipline, Not Just ML

The AI Engineering Stack

Measure What You Build

Ship Better AI Products

AI Engineering Myths

AI Engineering Is a
New Discipline, Not Just ML