← All Episodes
Based on Lenny's Podcast data
Lenny's Knowledge Sketch

Scale AI's $14B Meta Deal
& What's Actually Next

Jason Droege
CEO, Scale AI
OCT 9 2025
The Company

Scale AI: The Data Layer
Under Every Frontier Model

TRAINING DATA MARKET
"Every major AI model you use was trained with Scale's help. We're the infrastructure that makes AI reliable."
  • Scale provides data labeling, RLHF, and AI evaluation services
  • The $14B Meta deal: landmark enterprise AI contract
  • Training data quality determines model quality more than architecture
  • The government bet: defense AI is Scale's fastest-growing vertical
Framework

The Scale AI Model

RAW DATASCALE LABELTRAINED MODEL
$14B
Meta multi-year contract
$13.8B
Scale valuation
10+
frontier model customers
  • Scale's core service: human-in-the-loop data annotation at massive scale
  • RLHF platform: the feedback loop that aligns models to human values
  • Eval-as-a-service: ongoing model evaluation for enterprise customers
  • Government/defense: separate unit building AI for US national security
Jason's thesisScale is not a services company. It's an AI infrastructure company that happens to need humans to deliver the infrastructure.
Enterprise AI at Scale

What Scale Learned

  • Insight 1: Data quality compounds — better training data produces dramatically better models
  • Insight 2: Enterprise AI needs continuous evaluation, not just pre-deployment testing
  • Insight 3: Government AI is 5 years behind commercial AI but catching up fast
  • Insight 4: The data moat is real — it's harder to replicate than the model architecture
The data moat

Scale's most defensible asset is not its technology — it's its relationships with expert annotators in specialized domains.

The government opportunity

US defense AI is a multi-decade market. Scale is positioning as the trusted data partner for national security AI.

Playbook

Think About AI Data

  • Your training data IS your model — invest in its quality before model selection
  • Evals are not a one-time task — build continuous evaluation into your AI roadmap
  • Domain-specific data is worth 10× general data for domain-specific tasks
  • The data flywheel: more users → more data → better model → more users
The Scale predictionJason believes the AI training data market will be larger than the cloud market within 5 years. The data that trains AI is the new oil.
Contrarian

AI Data Myths

More data is always betterINSTEAD →More high-quality data is always better. Scale's value is quality, not just volume.
Open source data eliminates the need for ScaleINSTEAD →Open source data produces open source quality. Frontier models require frontier data quality.
AI will soon generate its own training dataINSTEAD →AI-generated training data causes model collapse. Human feedback remains essential for alignment.
Data labeling is a commodityINSTEAD →Expert data labeling is never a commodity. Medical, legal, and defense labeling require rare expertise.
𝕏︎ X / Twitterin LinkedIn📸 Instagram🔗 Copy link
0:00