Csaba Okrona, Head of Engineering at EGYM, has named something a lot of engineering leaders are feeling but can't quite articulate: verification debt. It's the gap between how fast AI can generate code and how fast humans can actually validate that code. "Every time someone approves a diff they haven't fully understood, they're borrowing against the future," Okrona writes, citing a concept from Lars Janssen. The problem is sneaky because everything looks fine. Tests pass. PRs look clean. Six months later, you realize you built exactly what the spec said, not what users needed.
One team shipped a payment flow rewrite that passed every test. Three weeks in, they found it was silently failing for customers with non-Latin characters in their addresses. The AI did exactly what the prompt asked. The prompt was wrong. Nobody caught it because the reviewer was twelve PRs deep and skimming.
The math is uncomfortable. If AI makes every engineer 50% more productive, you get more pull requests, more documentation, more design proposals. Someone still has to review all of it. When everyone on the team is generating more output, review becomes the constraint. The bottleneck moved upstream to the parts of the job that are irreducibly human: deciding what to build and making judgment calls about risk. This is the essence of the debate raised in Bram Cohen on Vibe Coding: You're Just Abdicating.
Okrona's advice is blunt: stop measuring AI impact through output metrics. PRs per engineer and velocity will go up. They should. Watch review latency and defect escape rate instead. Look for incidents that trace back to shipping something that looked fine but wasn't. Build review capacity explicitly, because it doesn't scale with generation capacity. And create space for engineers to admit they don't fully understand what a piece of code does. That admission is worth more than a fast approval.
The cognitive load hasn't shrunk. It shifted toward synthesis and judgment. As Okrona puts it, "Most of my day is asking agents questions and then validating their answers. It sounds efficient. In practice it's a different kind of exhausting." The skill isn't writing code anymore. It's understanding what code should do and verifying that it actually does it. And right now most teams are running that verification on fumes. Messy code costs more when AI agents do the reading.