Codex: Building the World's First
Software Engineering Teammate

Alexander Embiricos

Product Lead, Codex · OpenAI
Former founder · Ex-Dropbox PM

LENNY'S PODCAST

The Vision

From Tool to Teammate: The Codex Arc

"It's a bit like this really smart intern that refuses to read Slack and doesn't check Datadog unless you ask it to."

Today: pair with Codex to write, test, and run code
Near: async tasks delegated with minimal supervision
Goal: proactive teammate that surfaces work unprompted
The unlock: models that use computers by writing code

The Numbers

20x Growth Since GPT-5 Launch — What Unlocked It

20x

growth since Aug GPT-5 launch

days 0→public for Sora Android app

Tril.

tokens/week served by Codex models

Why integrated product+research wins

Codex built a tightly coupled product and research team iterating on the model and its harness together. This lets them run far more experiments than separate teams could, and it's why Codex became the #1 coding model in OpenAI's API as well as internally.

Sora Android: the acceleration proof point

A fully new Android app built in 18 days to internal launch, public 10 days after. Codex was used throughout. This is what Codex internally calls "the new bar" for how fast software ships.

The Reddit loop as product signal

The Codex PM team stays constantly on Reddit — tracking praise and complaints as a real-time product signal. When evals and benchmarks don't capture user pain, Reddit does.

Bottoms-up org as speed engine

OpenAI is "truly, truly bottoms up" — individual drive and autonomy at every level. Alexander's warning: you can't copy this without the underlying talent caliber. It's a system, not a policy.

The self-improving loop Codex writes code that manages its own training infrastructure. A Codex code review process is now catching real configuration mistakes in its own training run.

The Bottleneck Shift

Writing Code Is Solved — Reviewing It Is the New Hard Problem

"The real bottleneck right now is validating that the code worked and writing code review. If we wanted to get to the proactive agent world, we need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages."

Writing is fun, review is a burden — developers who loved coding now face 100 PRs to review per week
Slack + Codex integration works brilliantly for instant Q&A ("Why did this metric move?") but still sends devs back to the IDE to validate code
Throwaway code is exploding — people now build interactive data viewers and animation editors just for a single task, because spinning one up is trivially cheap
Designers are becoming engineers — the Codex team's designers vibe-code full side-prototypes, sometimes landing their own PRs
PM scope is expanding — Alexander runs Codex as a prototyping tool himself; specs are slower than prototypes now

Scott Belsky's idea, validated "Compressing the talent stack" — AI is collapsing the boundaries between roles. Each boundary removed is one communication overhead eliminated, making small teams dramatically more effective.

The Big Unlock

Proactivity: The Missing Ingredient in Every Agent

"If you think of how many times the average user is prompting AI today, it's probably tens of times. But if you think of how many times people could get benefit from a really intelligent entity, it's thousands of times per day."

The human bottleneck is typing speed and multitasking — not model quality
Any agent that waits to be prompted is leaving 99% of its value on the table
The Tinder+TikTok interface idea: agent surfaces proposals, you swipe to approve — lowest-effort human oversight possible
Browser launch (Atlas) creates ambient context: agent sees what you see, surfaces help in real time

The coding → any-agent insight "If you want to build any agent, maybe you should be building a coding agent." Code is how models most effectively use computers — making a coding agent the universal substrate for all agentic work.

Contrarian

Alexander's Counterintuitive Takes on AI Agents & Product

✗ AI agents should wait to be asked before doing anything INSTEAD → ✓ Proactivity is the goal. An agent that only responds to prompts captures maybe 1% of its potential value. Real teammates act without being asked.

✗ The hard part of AI coding tools is writing the code INSTEAD → ✓ Writing is solved. Reviewing and validating AI-generated code is now the actual bottleneck — and the next frontier for coding agents to tackle.

✗ You need a specialized agent for each domain (browser, code, data…) INSTEAD → ✓ Build a coding agent. Code is how models most effectively use any computer. A coding agent is the universal substrate — every other agent collapses into it.

✗ Move fast by planning carefully before you build — aim, then fire INSTEAD → ✓ At OpenAI the org is radically bottoms-up: fire quickly, learn empirically. Strategic fuzziness far out, rapid experimentation close in. Plans decay faster than results.