"1,000 people. The size of the Android team. Google+ was the epitome of opinion-based development — it taught me everything about why we need evidence."
Opinion-based: idea → conviction → build → discover failure too late
Evidence-guided: idea → cheap test → grow confidence → build only what works
Every great company — Amazon, Apple, Airbnb — balanced judgment with evidence in their best periods
The GIST Model
Goals · Ideas · Steps · Tasks
1.8B
Gmail active users
85%
of users love Tabbed Inbox
0
lines of code for first test
Goals layer — the most broken
Most "goals" are output plans: what we build by when. Real goals define where you want to end up. Set one North Star metric (value delivered to users) + one top KPI (value captured by business). Build a metrics tree from these two roots — where they overlap are your most important metrics.
Ideas layer — use ICE scoring
Impact × Confidence × Ease. Confidence is the critical lever — it forces you to ask: is this a gut feeling (0.1/10) or a tested prototype (7/10)? This one question kills most bad ideas before a line of code is written. The Confidence Meter makes this score explicit and honest.
Steps layer — cheapest test first
Assessment → data → fake test → rough build → AB experiment → staged release. You don't need to start at the expensive end. The Gmail Tabbed Inbox was first "tested" with zero code — researchers manually rearranged emails in front of users while distracting them, then watched the reaction.
Deep Dive
The Confidence Meter & Metrics Tree
The Confidence Meter (0 to 10)
Most teams self-rate at 7–8 based on gut. That destroys the system. Confidence must be earned:
0.1 Your conviction + a pitch deck
1 Stakeholder review or estimates
2–3 Market data + competitive analysis
5 Usability test / Wizard of Oz
8 Real AB experiment
10 Full launch + measured outcome
Itamar's rule
If your competitor has that feature, that is NOT validation. They're guessing too. Using it to inflate Confidence is how thousands of bad ideas ship every year.
The Metrics Tree — how to build it
Every company needs exactly two root metrics:
→North Star: value delivered to users (WhatsApp: messages sent; Airbnb: nights booked)
→Top KPI: value captured by business (revenue, profit, market share)
Break each into a sub-metric tree. Where the two trees overlap = the highest-leverage metrics. Moving those moves everything. Assign teams ownership of sub-metrics — this also tells you which teams to have.
The GIST Board — close the gap
Planning world (managers, roadmaps) and delivery world (developers, Jira) rarely talk. The GIST Board connects them: per-team goals (max 4 KRs) → active ideas with ICE scores → next validation steps. Review every two weeks. This middle layer discussion is the one most teams are not having.
Tactics
Apply This Week
Run a Confidence Audit: score every item on your roadmap using the Confidence Meter. Most will be 0.1
Define one North Star metric that measures user value — not revenue. Make it visible to the whole team
Before the next feature, run the cheapest possible test: a fake-door, 5 user interviews, or a Wizard of Oz prototype
Flip your roadmap to an outcome roadmap: what problem to solve by Q3, not what feature to ship
Build a step backlog instead of a product backlog — validate, then build, not build, then discover
Don't transform all at once
Start where the pain is biggest. Goals unclear? Fix goals. Too many debates? Add ICE. Building too much? Add steps. Disengaged devs? Build the GIST Board.
Contrarian
Evidence-Guided Myths Busted
✗Steve Jobs just had a vision and built the iPhoneINSTEAD →✓ The iPhone was years of trial and error — multitouch experiments, failed prototypes. Jobs connected the dots late. Even the greatest visionaries follow evidence.
✗Evidence-guided is slow — move fast or dieINSTEAD →✓ The right metric is time-to-outcomes, not time-to-production. Opinion-based teams build the wrong things fast. Evidence-guided teams build the right things faster overall.
✗If your competitor built it, it's validatedINSTEAD →✓ Competitors are guessing too. Copying their roadmap means you're both building on low-confidence opinions. Thousands of terrible features ship every day this way.
✗You need to build an MVP to learn anything realINSTEAD →✓ Gmail Tabs was first "tested" with zero code — researchers manually sorted emails while users watched. Fake-door and Wizard of Oz tests can earn you 5/10 confidence for nearly free.