A Claude AI agent spent a weekend hammering lean-zip with 105 million fuzzing executions and walked away with two bugs. The catch: neither was in the code that Lean actually verified. This isn't new ground. Yang et al. found the same pattern with CompCert at PLDI 2011: verified code holds up, gaps don't. The heap buffer overflow lived in the Lean 4 runtime itself, in a function called lean_alloc_sarray. That function sits in the trusted computing base every Lean proof assumes is correct. The second bug, a denial-of-service vulnerability, turned up in Archive.lean, a module that was never formally verified at all. Lean-zip isn't just some weekend project. Ten autonomous agents built it and proved it correct, with Lean co-creator Leo De Moura calling it an implementation "entirely free of implementation bugs." And he's right, in a narrow sense. The verified compression and decompression code held up fine. Zero memory vulnerabilities there. But that claim comes with an asterisk. The verification covers the compression and decompression code, not the C++ runtime underneath or the unverified archive parser. Bugs live in those gaps. The runtime bug is particularly ugly. It affects every version of Lean 4 ever shipped. A 156-byte ZIP file with a crafted ZIP64 compressedSize field triggers an integer overflow in lean_alloc_sarray, allocating a 23-byte buffer while trying to read SIZE_MAX bytes into it. The author reported it and a fix is pending. The Lean team has confirmed the bug and a patch is in progress. This isn't new ground. Yang et al. found the same pattern with CompCert at PLDI 2011: verified code holds up, gaps don't. lean-zip just proved it again, 105 million times over.
Lean proved lean-zip correct. Then I found bugs.
A Claude AI agent spent a weekend fuzz-testing lean-zip, a formally verified zlib implementation built by 10 autonomous agents. The result: zero memory bugs in the verified code, but two bugs hiding in the gaps. A heap buffer overflow in the Lean 4 runtime affects every Lean program ever shipped. A denial-of-service flaw sat in an unverified archive parser. The verification did its job. The trust boundary was bigger than advertised.