Mathematically Proven Bug-Free — Then the Bugs Showed Up

“This program has been mathematically proven correct. It has no bugs.”

Sounds reassuring, right? Except programs that pass formal verification do, in fact, turn out to have bugs. Math didn’t fail. We just asked it the wrong question.

What Formal Verification Actually Does

Formal verification proves that a program satisfies a given specification — a precise mathematical description of what the program should do. Testing checks “it works for this input.” Formal verification guarantees “this property holds for every possible input.”

Lean is one of the most prominent theorem provers in this space. Originally built for mechanically checking mathematical proofs, Lean 4 evolved into a general-purpose programming language, dramatically expanding its potential as a software verification tool. Google, Microsoft, and Amazon have all adopted formal verification in parts of their most critical infrastructure.

So in theory, we’re covered. In practice, bugs still slip through. Why?

The Spec Is the Weak Link

Here’s the crux: formal verification proves “the code satisfies the spec.” It does not prove “the program has no bugs.”

That distinction sounds subtle. It is enormous.

Consider verifying a sorting algorithm. If your spec says “the output list is in ascending order” and nothing else, a program that ignores the input and returns an empty list passes with flying colors. Mathematically flawless. You just forgot to specify that every input element must appear in the output.

This is the specification bug problem. Translating real-world requirements into mathematical specs is hard, and things get lost or mangled in translation. Math answers exactly what you ask — nothing more. It won’t flag the questions you forgot to pose.

The Gap Between Proof and Reality

Even with a perfect spec, problems remain. Formal verification relies on a Trusted Computing Base (TCB) — the components the proof depends on but doesn’t cover.

The compiler. Code verified in Lean still needs to be compiled to machine code. Bugs can creep in during that translation. Even CompCert, a formally verified C compiler, has had bugs found in its unverified components.

The OS and hardware. seL4, the formally verified microkernel, is proven correct at the C source level. But the assembly code underneath? The hardware behavior? Outside the proof’s scope. A subtle memory controller timing issue? The proof has no idea.

The theorem prover itself. Lean 4’s type checker has had soundness issues reported. If the tool verifying your proof has a flaw, the entire foundation cracks. It’s like discovering the locksmith who guaranteed your lock is unpickable forged his own credentials.

AI-Generated Code Makes This Harder, Not Easier

Here’s where things get interesting.

AI coding assistants are producing code at an unprecedented scale. The natural follow-up idea: “Just formally verify what the AI writes.” Researchers are already building pipelines where one AI generates code, another generates Lean proofs, and the theorem prover mechanically checks them.

The weakest link in that pipeline is still writing the spec. No matter how sophisticated the proof, deciding what to prove remains a human job. And AI-generated code is often harder to understand than code a human designed from scratch. Writing a complete spec for code you don’t fully understand is close to impossible.

There’s a darker scenario too: false confidence. Once code carries a “mathematically proven correct” label, the pressure to do thorough code review, integration testing, and manual inspection drops. Treat formal verification as a silver bullet and you end up with a system that’s less safe overall — the opposite of what you intended.

So Is Formal Verification Useless?

Not remotely. It remains one of the most powerful tools in software quality assurance. The key is understanding exactly what it can and cannot do.

Formal verification excels at eliminating entire classes of bugs — buffer overflows, integer overflows, certain logic errors. But it cannot guarantee spec completeness, safe interaction with external systems, or end-to-end system correctness. It’s a wall. A very strong wall. But it only covers the sides where you built it.

The right approach is defense in depth: formal verification as one layer alongside testing, code review, and static analysis — not a replacement for any of them.

Math doesn’t lie. But it only answers the question you actually asked. Formalizing “is this program correct?” might be harder than building the program itself. As AI writes more and more of our code, the skill we need most isn’t a better prover. It’s the ability to ask better questions.

Mathematically Proven Bug-Free — Then the Bugs Showed Up

What Formal Verification Actually Does

The Spec Is the Weak Link

The Gap Between Proof and Reality

AI-Generated Code Makes This Harder, Not Easier

So Is Formal Verification Useless?

Comments

Related Logs

Anthropic Wants to Mathematically Eliminate Software Bugs. Here's Why That's Hard

The AI Perception Gap: Why Insiders Are Celebrating While Everyone Else Is Nervous

Everyone Called Apple the AI Loser. They Might End Up Winning