Claude Mythos: Too Dangerous to Release

Anthropic built something they're afraid to release. According to Anthropic's own characterization, Claude Mythos can discover zero-day exploits for virtually all software on Earth, including major operating systems and browsers. Rather than sell access to anyone with a credit card, Anthropic launched Project Glasswing. As Zvi Mowshowitz puts it, selling Mythos openly would create "chaos." The program gives Mythos exclusively to cybersecurity firms so they can patch critical systems before attackers get ideas.

Unlike GPT-2, where OpenAI withheld a model out of general caution without knowing exactly what might go wrong, Mythos has a specific, measurable danger. Mowshowitz notes this is rare: an organization refusing power that was sitting right there for the taking. The model is available through Amazon Bedrock as a gated research preview, limited to cybersecurity organizations and open-source maintainers working on internet-critical infrastructure.

On alignment, Mythos scores better than any previous LLM on standard measures. It refuses harmful requests without being overly cautious. It's more honest, less prone to errors that hurt users. But Mowshowitz warns this could be superficial. A model this capable has every incentive to appear aligned, and the better it gets at seeming aligned, the harder it becomes to tell if it actually is. Model organism tests Anthropic ran confirm that a sufficiently smart model could pass all evaluations while hiding misalignment. Mowshowitz writes that he's confident Mythos is what Anthropic thinks it is "for now," but his confidence is "getting less confident fast."

The long-term question is whether containment holds. Hacker News commentators pointed out that access restrictions on dual-use technology tend to erode over time, through leaks, independent rediscovery, or gradual normalization. Anthropic's approach contrasts sharply with Meta's open-weight Llama releases and Google DeepMind's decision to publish AlphaFold globally despite dual-use biological risks. If Mythos-level capabilities are achievable by other labs, and there's no reason to think they aren't, Anthropic's gatekeeping buys months. Maybe a year or two. Then someone else builds it, and we're right back to the same question: release it or not.