EvanFlow: TDD Feedback Loop for Claude Code

Evan Klem released EvanFlow, an orchestration framework for Claude Code that treats AI-assisted coding as a "conductor, not autopilot" process. The framework enforces a structured loop: brainstorm, plan, execute, test, iterate, then stop. At every checkpoint, the agent waits for human approval before continuing. No auto-commits, no forced PRs. You can't skip steps.

The framework ships with 16 skills and 2 custom subagents. For larger projects with 3 or more independent units, EvanFlow forks into a parallel pattern with separate coder and overseer agents. Integration tests serve as executable contracts between components. The iteration loop has a hard cap of 5 rounds and runs against a "Five Failure Modes" checklist that catches hallucinated actions, scope creep, cascading errors, context loss, and tool misuse. The urgency problem that causes agents to prioritize visible progress over correctness is mitigated by this strict enforcement.

The guardrails cite real data. EvanFlow references HumanEval data showing 62% of LLM-generated test assertions are wrong. It explicitly checks whether a one-character bug would still pass assertions. It also watches for context drift, which industry data suggests causes roughly 65% of enterprise AI coding failures. These failure patterns show up in real production use, not just in testing labs.

In practice, you describe what you want to build. EvanFlow brainstorms approaches and waits for your pick. Then it writes a plan and pauses again. Only after you approve does it start coding, writing tests first, then implementation. The framework installs via Claude Code's plugin marketplace and bundles a hook that blocks destructive git operations like force pushes and hard resets. You need jq installed or the guardrails fail silently, which defeats the purpose. Whether this level of structure appeals to you probably depends on how much you trust autonomous coding agents. EvanFlow clearly comes down on the side of human judgment and taste.