Anthropic's new Mythos model generates working sandbox escapes for Firefox's JavaScript engine 72.4% of the time. Opus 4.6, evaluated on the same benchmark, managed under 1%. The target is SpiderMonkey, Firefox's standalone JS shell, the innermost layer of browser sandboxing. Not the full browser chain. But the leap from negligible to reliable in one model generation, on real software, is stark.
Martin Alderson laid out the implications in detail. Sandboxes underpin nearly every security boundary we rely on daily: browsers, cloud computing, mobile apps, even banner ads. The whole model assumes escaping these boundaries is too difficult and expensive to attempt at scale. Automate that process with a model that works three-quarters of the time, and the assumption gets shaky.
Missing from Anthropic's release: any mention of Coordinated Vulnerability Disclosure with Mozilla. Standard practice gives vendors 90 days to patch before findings go public. Anthropic released a model that automates exploit creation for live, production software without that courtesy. Firefox users are exposed while Mozilla scrambles.
The Hacker News crowd pushed back on the alarm. Some argued automated vulnerability discovery could accelerate patching, the way Dependabot surfaces dependency flaws. Fair point, but it assumes organizations will fix what AI finds rather than hoping nobody else looks. Mythos runs $125 per million output tokens and requires serious compute. Those constraints won't last. Smaller models like Gemma 4 already punch well above their weight class.