Move 37: Why We're Measuring AI Wrong

In March 2016, AlphaGo played move 37.

Game 2 against Lee Sedol, one of the greatest Go players in history. Move 37 was a shoulder hit on the fifth line—a play so bizarre that commentators assumed it was a mistake. The estimated probability of a human professional playing that move? One in ten thousand.

Lee Sedol left the room. He needed 15 minutes to recover.

AlphaGo won the game. Move 37 wasn’t a bug. It was a breakthrough that humans had never discovered in 3,000 years of playing Go.

That moment changed how I think about everything.

What Made Move 37 Different

Move 37 wasn’t impressive because it was smart. It was impressive because it was wrong—until it wasn’t.

Every human expert watching thought the same thing: that’s a bad move. Their pattern recognition, trained on thousands of games, said this wasn’t how you play Go. The move violated the grammar of professional play.

But AlphaGo hadn’t learned Go by studying human games. It learned by playing millions of games against itself—massive parallel exploration of possibilities that humans had never considered.

Move 37 came from a place humans couldn’t reach through sequential thinking. It emerged from processing game states at a scale no human mind could match.

Here’s what matters: You can’t get there by expert imitation. You can only get there by exploring the space so thoroughly that you find stable configurations humans never discovered.

We’re Benchmarking AI Wrong

Right now, the AI industry is obsessed with comparing models to human education levels.

“GPT-4 performs at the level of a top undergraduate.”

“Claude scores in the 90th percentile on the bar exam.”

“Gemini achieves PhD-level performance on research tasks.”

This completely misses the point.

Even a PhD operates sequentially. One brain, one thread of thought, limited by the speed of human cognition. Comparing AI to human education levels is like benchmarking a jet engine by how fast a horse can run.

The right question isn’t “Is this AI as smart as a PhD?”

The right question is “Can this AI find Move 37 in this domain?”

Can it discover valid but unexpected solutions through massive parallel exploration? Can it find stable configurations that sequential human thinking would never reach? Can it produce insights that seem wrong at first but prove brilliant in retrospect?

That’s the capability that matters. And we’re not measuring it.

Where Move 37 Hasn’t Happened Yet

Go is a closed system. Fixed rules, clear win conditions, perfect information. The ideal environment for finding Move 37 moments through self-play.

But what about open-ended domains?

Urban planning: Feed AI massive data about a city—traffic patterns, population movements, economic flows, infrastructure usage, environmental data. Challenge it to find development configurations no human planner has considered. The Move 37 insight might be something like: “Build the transit hub here, where no current population exists, because six converging patterns suggest density will shift in 15 years.”

Climate response: Input data on global weather patterns, economic activities, resource consumption, technology adoption rates. Look for intervention points that create cascading positive effects. The Move 37 insight might identify a leverage point that seems insignificant but ripples across systems in ways linear analysis would miss.

Drug discovery: Protein folding already showed glimmers of this with AlphaFold. But we haven’t seen the full Move 37 moment—the drug target that seems biologically implausible but works for reasons we don’t yet understand.

Business strategy: What if you could run millions of simulated market scenarios and find positioning that violates conventional wisdom but proves optimal? The Move 37 of business might look like “enter the market here, at this price point, with this message”—and every MBA would say you’re wrong until you win.

We’re still early. The thinking models (o1, DeepSeek-R1, Gemini Flash Thinking) are showing the first glimmers—discovering cognitive strategies through training that resemble human internal monologue. Approaching problems from different angles, trying ideas, backtracking, finding analogies.

But we haven’t seen Move 37 in open-ended domains yet. Not really.

What Move 37 Thinking Requires

If you want to find Move 37 moments, you need:

1. Massive parallel exploration

Not iterating on best practices. Not A/B testing variations. Exploring the entire solution space, including regions that look obviously wrong.

2. A way to evaluate outcomes

Go has clear win conditions. Open-ended domains need proxy metrics that actually correlate with success. This is harder than it sounds—most business metrics optimize for local maxima.

3. Willingness to trust counterintuitive results

When Move 37 appears, your pattern recognition will scream “that’s wrong.” The discipline to test it anyway separates people who find breakthroughs from people who filter them out.

4. Tolerance for inscrutability

AlphaGo couldn’t explain Move 37. It emerged from weights and activations that don’t map to human reasoning. If you need to understand why before you’ll try something, you’ll never find Move 37.

The Uncomfortable Implication

Here’s what keeps me up at night:

The weirdness of reinforcement learning is, in principle, unbounded.

It’s plausible—even likely—that optimization at sufficient scale invents its own representations. Internal languages that are inscrutable to us but more effective at problem-solving. Strategies that don’t map to any human framework.

Move 37 was weird but ultimately explicable. Commentators eventually understood why it worked. Future Move 37 moments might not be explicable at all. They might just… work. And we’ll have to decide whether to trust results we can’t understand.

That’s the trade-off. If you want breakthroughs that humans couldn’t find through sequential thinking, you might get breakthroughs that humans can’t understand through sequential thinking either.

How to Apply This

You don’t need AlphaGo’s compute budget to think in Move 37 terms.

When solving hard problems, ask:

What solutions have I filtered out because they “obviously” won’t work?
What would massive parallel exploration of this space reveal?
Where are the stable configurations that nobody has tried because they violate conventional wisdom?

When evaluating AI tools, ask:

Is this tool helping me iterate faster on known approaches?
Or is it exploring solution spaces I couldn’t reach myself?

When building products, ask:

Am I optimizing within the existing grammar?
Or am I searching for moves that violate the grammar but might be secretly brilliant?

Most of the time, you should follow best practices. Proven patterns exist because they work.

But once in a while, in domains that matter, it’s worth asking: Where’s the Move 37?

We’ve spent 3,000 years playing Go. Millions of games. Thousands of professionals dedicating their lives to mastery. And we never found Move 37.

What else haven’t we found?

What Move 37 moments are sitting in your domain—in your market, your product, your strategy—waiting for someone willing to explore the space that “obviously” doesn’t work?

The answer is probably more than you think.