← All Episodes
Based on Lenny's Podcast data
Lenny's Knowledge Sketch

Building the Bleeding Edge:
How AI Models Learn

Karina Nguyen
AI Researcher, OpenAI
(formerly Anthropic)
EARLY 2025
The Insight

Model Training is an Art, Not a Science

RAW DATADEBUGMODELCONTINUOUS ITERATION
  • Data quality matters far more than data quantity
  • The debugging process is like debugging software: find contradictions in training data, resolve them
  • Models get confused when given conflicting signals (e.g., "you have no body" + "set an alarm")
  • Balancing helpfulness vs. safety requires constant tradeoffs
Key tensionEvery model learns from contradictory data. Making the model robust across diverse scenarios is the real art.
The Paradigm Shift

Post-Training Scales Infinitely — No Data Wall

PRE-TRAINPOST-TRAINRL TASKS(infinite)
  • Pre-training: The "data wall" is real here — models trained on the entire internet saturate
  • Post-training: Scale comes from reinforcement learning, not raw data
  • The infinite loop: You can create infinite tasks via RL (search, coding, writing, reasoning)
  • Proof: o1 model saturates existing benchmarks (GPQA at 60-70%, PhD-level reasoning)
RL tasks available
60%+
PhD benchmarks (o1)
The bottleneck todayNot data or models — it's evaluation. We need better frontier evals to measure progress beyond saturated benchmarks.
Synthetic data roleMix of synthetically generated tasks, product-derived user feedback, and expert human data. Not just one or the other.
Building Products With AI

Canvas: Teaching a Model Through Behaviors

  • Behavior 1: Trigger logic — When to show Canvas (long document edits) vs. not (info requests)
  • Behavior 2: Edit autonomy — Can the model find specific paragraphs and rewrite them, or only full rewrites?
  • Behavior 3: Commenting — How does the model critique its own output intelligently?
  • The method: Use o1 to generate quality examples, inject critique prompts, measure via robust evals
The workflow

Sit with the model, debug why it's behaving unexpectedly, design synthetic examples that teach the desired behavior, measure with evals, repeat.

From beta to GA

Launch, observe real user behavior patterns, shift training distribution to match how users actually use it (not how you predicted).

"Form follows function. The file upload feature wasn't just UX—it enabled people to upload books, reports, anything and ask any task. That's the killer use case."
The Shift

Cost of Intelligence Drops Drastically

  • Small > Large: Claude 3 Haiku is smarter than Claude 2 while being smaller and faster
  • Implication: Builders and developers get AI access; work bottlenecked by intelligence unblocks
  • Healthcare: AI diagnostic matching or beating human doctors
  • Education: Every student gets a personalized tutor
  • Automation: Redundant tasks disappear; creative work stays
What survivesEmotional intelligence, creativity, people management — the hardest human skills. Write well if you want safety.
Contrarian

The Future of Work & AI Skills

Better models = better products automaticallyINSTEAD →The product, interaction, and UX matter as much as the model. Interface is everything.
Build the best product for todayINSTEAD →Build for the future. By the time models are ready, your product design will be perfect for them.
Technical skills keep you safe from AIINSTEAD →Creative thinking, deep listening, and rapid iteration are what AI can't replace. Build things AI couldn't dream of.
AI will replace creative workINSTEAD →AI gets to 80% easily. The last 20%—aesthetic taste, emotional resonance—is pure human craft. That's where the mode lives.
𝕏︎ X / Twitterin LinkedIn📸 Instagram🔗 Copy link
0:00