D&D's Combat Nightmare Gets Formal Verification

Someone finally did the obvious thing with D&D's notoriously messy combat rules: they formalized the whole thing. The author built a Quint specification covering all 12 character classes, 14 conditions, legendary creatures, and the nested interrupt chains that make mid-combat reactions so confusing at the table. The spec uses a literal stack data structure to handle Counterspell chains, Shield interrupts, and all the other moments where a single attack branches into multiple reaction windows before damage resolves. Model-based testing caught argument swaps, state sync bugs, and design flaws that simpler tests would miss, much like JUXT applied Claude AI to find a 57-year-old bug in Apollo 11 guidance computer code.

The interrupt problem is genuinely hard. When a Rogue moves away from an enemy, that triggers an Opportunity Attack, which opens a window for Shield, then Uncanny Dodge, then concentration checks, all before the Rogue finishes moving. Every action can trigger reactions, which trigger counter-reactions. The spec tracks return addresses as tagged unions so the state machine knows where to resume after each chain resolves. XState mirrors the same pattern matching, and the MBT framework compares both implementations step by step.

Here's the part that grabbed me: the author used an LLM to translate 12,700 community Q&A entries into Quint assertions. D&D rulings live scattered across forums, tweets, and Sage Advice columns. Feeding all that informal text into a formal specification language is exactly the kind of task where LLMs shine. They're great at structuring ambiguous natural language into formal grammars. That output becomes a test suite that catches hallucinations in the AI's own translation. The formal spec acts as a verification layer for AI itself, standing in contrast to the phenomenon of cognitive surrender observed in recent research.

Hacker News commenters called this an obvious next step for D&D's rules. They're right. Thousands of pages of errata and edge cases, all governed by text, all begging for formal verification. The project proves the concept works beyond toy examples.