Based on Lenny's Podcast data
The ThesisHuman Feedback IS
the AI Moat
"Anthropic and Google's models are as good as they are because of the quality of human feedback they were trained on. That quality is Surge."
- RLHF quality determines model quality more than architecture choices
- The labelers are not a commodity — domain expertise + good judgment = rare combination
- Surge built the expert network that taught AI what's good and bad
- The invisible layer: most people don't know this work exists, but it determines what AI does
FrameworkHow RLHF Actually Works
$500M+RLHF market in 2025
100K+expert labelers globally
10×output quality gap between good/bad RLHF
- Not just "is this good?" — expert raters understand nuance, context, and edge cases
- Domain expertise matters: medical RLHF requires doctors, legal requires lawyers
- Scale + quality: Surge's bet is expert-quality feedback at production scale
- The eval loop: better feedback → better model → harder evals → need better feedback
The hidden variableThe most important people in AI development are often not the researchers or the engineers — they're the expert labelers.
The RLHF IndustryWhat Most People Don't Know
- Volume: Frontier model training requires millions of human preference judgments
- Quality: Bad RLHF produces confidently wrong, helpful-sounding AI
- Specialization: Code RLHF, reasoning RLHF, safety RLHF require different experts
- The problem: Most AI companies underinvest in RLHF quality — and it shows
Why Claude feels different
Claude's helpfulness and safety balance comes from deliberate, expert human feedback — not just prompting.
The eval arms race
As models get better, the humans rating them need to be smarter. The bar keeps rising.
PlaybookThink About AI Quality
- Ask: where does the AI's "judgment" come from? Trace it to the training data.
- Build eval datasets with real domain experts, not just crowdsourced workers
- The best product insight: what does the AI confidently get wrong in your domain?
- Invest in evals before you invest in prompts — you can't improve what you can't measure
Edwin's missionSurge exists because good AI requires good humans. The quality of human feedback is the quality of the AI.
ContrarianAI Quality Myths
✗Bigger models need less human feedbackINSTEAD →✓ Bigger models need better human feedback. The bar rises with model capability.
✗Crowdsourced labeling is fineINSTEAD →✓ Crowdsourced labeling is fine for simple tasks. Expert labeling is required for nuanced judgment.
✗RLHF is just annotationINSTEAD →✓ RLHF is judgment at scale. The best labelers are the world's most underpaid AI researchers.
✗Models will soon self-improve without humansINSTEAD →✓ Models self-improve with AI feedback loops. But the root signal is still human values.