Anthropic's Safety Definition Skips the Trust Question

Anthropic earned praise for withholding Claude Mythos over cybersecurity concerns. A real cost absorbed for a real reason. But a new opinion piece from Jonathan Nenen argues the company's definition of what counts as a "safety issue" is dangerously narrow. When Anthropic classifies something as safety-critical, it shows discipline. When it doesn't, the execution falls apart.

April illustrated the gap. Claude Code quality degraded from a default reasoning effort change, a thinking-cache bug, and a verbosity-reducing system prompt that touched three production models with no staged rollout. Anthropic's postmortem was transparent, but as Nenen notes, it grouped a deliberate product decision with actual bugs, muddying the lessons. A pricing experiment led by Head of Growth Amol Avasare made things worse. Global pricing documentation was updated before users were informed, causing backlash on Hacker News. Fortune reported users felt "gaslit." Simon Willison captured the core problem: when pricing communication treats developers as an A/B cohort rather than customers making procurement decisions, the cost isn't churned subscribers. It's trust.

The gap matters. Anthropic's market position rests on being the trustworthy lab, and users don't experience the Claude Constitution as a document. They experience pricing confusion, silent quality changes, and defensive communications after the fact. Trust isn't compartmentalized. Operational decisions spend the same balance sheet as model safety work. Treating everything downstream of research as "just product" isn't neutral. It's a choice that undermines the credibility Anthropic needs when it asks people to believe it's handling the hard problems responsibly.