A Boy That Cried Mythos: Verification Is Collapsing Trust in Anthropic

Security researcher Davi Ottenheimer has published a detailed takedown of Anthropic's cybersecurity claims around Claude Mythos Preview, and the details aren't pretty. The company's 244-page system card, which is supposed to justify why the model is too dangerous to release, dedicates just seven pages to actual security content. Those seven pages contain zero mentions of fuzzing, no CVE identifiers, no CVSS severity scores, and no disclosure timelines. The word 'thousands,' which Anthropic's marketing team used liberally in press materials about zero-day vulnerabilities, appears exactly once in the entire document, referring to transcript reviews during alignment testing.

The flagship Firefox 147 demonstration collapses under scrutiny. It wasn't actually Firefox. Anthropic used Mythos to test against a SpiderMonkey JavaScript shell running in a container, stripped of the browser's sandbox and other security mitigations. The 50 crash categories weren't discovered by Mythos either. They were pre-found by Claude Opus 4.6, and Mozilla had already shipped patches in Firefox 148 before the evaluation started. The headline 72.4% full code execution rate? Remove the two most exploitable bugs and it drops to 4.4%, which Ottenheimer notes is indistinguishable from Claude Sonnet 4.6's performance. Independent analysis from AISLE reportedly found that smaller open-weight models could replicate the same findings.

Then there's Project Glasswing. Anthropic calls it a '$100 million defensive initiative'. The actual cash commitment is $4 million. The rest is product credits for the thing being evaluated. No Glasswing partner has publicly confirmed any specific finding. The promised 90-day public report hasn't materialized. Partners are bound by NDAs that prevent them from verifying or challenging Anthropic's claims, creating a convenient accountability gap. Ottenheimer calls the whole initiative regulatory capture dressed up as responsible AI.

Some security community pushback has been reasonable. Several researchers pointed out that building working exploits is genuinely hard even when bugs are handed to you, so dismissing the exploitation step as minor isn't fair. Others noted that NDAs alone could explain partner silence without implying deception. Fair points. But the core problem holds: if AI companies want us to believe their agents represent a step change in offensive capability, the technical docs need to back up the press releases. For Anthropic right now, they don't.