← All Episodes
Based on Lenny's Podcast data
Lenny's Knowledge Sketch

AI Engineering 101:
The Definitive Primer

Chip Huyen
Author "AI Engineering"; Stanford, Nvidia, Netflix alum
OCT 23 2025
The Discipline

AI Engineering Is a
New Discipline, Not Just ML

RESEARCHENGINEERINGML OPSAI ENG
"AI engineering is not machine learning. It's the discipline of building reliable, scalable, evaluatable AI systems in production."
  • ML research: finding new algorithms. AI engineering: making algorithms work reliably.
  • The stack: prompting → RAG → fine-tuning → training (most products never need the last 2)
  • Evals are the engineering discipline of AI — measure before you optimize
  • Most AI product failures are engineering failures, not model failures
Framework

The AI Engineering Stack

PROMPTEVALDEPLOY
80%
of AI products need only prompting + RAG
10×
improvement from good evals vs better models
faster iteration with eval-first approach
  • Layer 1: Prompt engineering — most product work lives here
  • Layer 2: RAG (Retrieval-Augmented Generation) — inject relevant context
  • Layer 3: Fine-tuning — only when you have thousands of examples and clear improvement
  • Layer 4: Training — almost never needed for product teams
Chip's ruleIf you're fine-tuning before you've exhausted prompt engineering and RAG, you're solving the wrong problem.
Evals: The Critical Discipline

Measure What You Build

  • Evals = tests for AI systems: Unit tests for model behavior
  • What to evaluate: Correctness, safety, latency, cost, user satisfaction
  • How to build evals: Start with 20-50 examples your product gets wrong or right
  • The eval loop: Run evals before each model upgrade or prompt change
The eval trap

Most teams skip evals until things go wrong in production. By then, you have no baseline to fix from.

The human eval

Automated evals scale; human evals set the ground truth. You need both. Human evals define what good looks like.

Playbook

Ship Better AI Products

  • Build your eval framework before your 3rd AI feature
  • Start with 20 golden examples: 10 that should work, 10 edge cases
  • Use LLMs to help write evals — it's the AI engineering meta-loop
  • Publish your eval results internally — it creates accountability and surfaces regressions
The book insightChip's "AI Engineering" book is the first that bridges the gap between ML research and product-level AI engineering. Required reading for AI PMs.
Contrarian

AI Engineering Myths

Better model = better productINSTEAD →Better evals = better product. Model quality matters; eval quality determines if you can tell the difference.
Prompt engineering is not real engineeringINSTEAD →Prompt engineering is the highest-leverage skill in AI product development. Dismiss it and get outperformed.
Fine-tune on your data earlyINSTEAD →Fine-tune after you've fully exploited prompt + RAG. Early fine-tuning is premature optimization.
AI products don't need testsINSTEAD →AI products need MORE tests than traditional software because the failure modes are probabilistic.
𝕏︎ X / Twitterin LinkedIn📸 Instagram🔗 Copy link
0:00