AlphaEvolve and the Coding Agent That Cracks Math Open
The word “agent” has been beaten into meaninglessness this year. Every startup has one. Most of them are wrappers. Then Google DeepMind dropped AlphaEvolve, which casually broke an algorithmic record that had stood since 1969. This is what happens when a coding agent stops trying to build your todo app and starts doing actual research.
So what is AlphaEvolve, exactly
AlphaEvolve is DeepMind’s evolutionary coding agent. Gemini does the thinking, but the architecture is the interesting part. It doesn’t generate code once and call it done. It generates thousands of candidate algorithms, evaluates them, mutates the survivors, and runs the loop again.
It’s literal natural selection for code. A human researcher might test a few dozen hypotheses in a day. AlphaEvolve runs tens of thousands in parallel. The bet is simple: at sufficient scale and speed, quantity becomes quality.
The contrast with earlier DeepMind work is telling. AlphaCode and AlphaGo played defined games — competitive programming, Go, StarCraft. AlphaEvolve is pointed at open research problems and production infrastructure. There’s no leaderboard. The judge is whether the code is faster, smaller, or provably correct.
A 56-year-old matrix multiplication record, broken
The headline result: AlphaEvolve cut the multiplication count for 4x4 complex matrix products from 49 to 48. That’s the first improvement on Strassen’s 1969 algorithm in over half a century.
One multiplication doesn’t sound like much. Then you remember matrix multiplication is the atom of deep learning. Shaving a single operation off the inner loop translates into billions of saved GPU ops at scale. Mathematicians and computer scientists have been hammering on this wall for 56 years. An AI walked up and pushed it over.
DeepMind says AlphaEvolve attacked more than 50 open problems in mathematics and improved the state of the art on roughly 20% of them. The kissing number problem — a question geometers have wrestled with for nearly three centuries — is reportedly on the list. Hacker News went predictably feral.
It’s already inside Google’s data centers
This isn’t a research demo gathering dust. AlphaEvolve is already in production at Google. It rewrote a data center scheduler and recovered 0.7% of total compute across the fleet.
Sounds like a rounding error. At Google’s scale, 0.7% is the kind of number that justifies entire engineering orgs. Then there’s the kernel code AlphaEvolve rewrote for Gemini training itself — a 1% reduction in training time. When a single training run costs millions, 1% is real money.
Pieces of TPU circuit design have also adopted optimizations the agent suggested. Chip layout, training kernels, data center scheduling — AI is now touching nearly every layer of Google’s AI infrastructure stack. The model is helping build the model.
“AI generates new knowledge” — does it?
The philosophical fight is already underway on X and HN. Is what AlphaEvolve produces actually new knowledge, or just clever recombination of training data?
Skeptics argue LLMs only remix patterns they’ve seen. Defenders point out that humans spent 50 years failing to find this solution, and AlphaEvolve found it. There’s also a clean answer to the hallucination problem: every result is a verifiable mathematical proof or a measurable performance gain. The judge is math, not vibes.
Honestly, I think the debate becomes irrelevant fast. The moment a human verifies AlphaEvolve’s proof and publishes it, it’s just human knowledge. Who proposed it first stops mattering. Whether it’s correct is the only question that survives.
What this means for coding agents
The lesson here isn’t “AI got smarter.” It’s that the frontier for coding agents is shifting from shipping web apps to doing science.
The coding agents you’ve been using — Cursor, Devin, Copilot — automate the grind of everyday software work. AlphaEvolve aims at problems where nobody knows the answer, including the humans who designed the problem. This might be the first real glimpse of a research agent rather than a coding assistant.
The limits are sharp, though. AlphaEvolve only works where there’s a clean evaluation function. Math proofs, algorithmic efficiency, chip area — anything where good and bad can be machine-judged. Drug discovery, policy design, anything fuzzy and contested? Still firmly out of reach.
Closing thought
The era of coding agents that only write code is over. AlphaEvolve is simultaneously elbowing into the work of mathematicians, chip designers, and SREs. If your field has problems with clean, automatic evaluation criteria, congratulations — you’re on the shortlist. The question isn’t whether your work gets automated. It’s which part goes first.
Comments
Loading comments...