Anthropic silently cut cache TTL from 1h to 5min on March 6th

Someone dug through 119,866 Claude Code API calls and found that Anthropic quietly cut the prompt cache TTL from 1 hour down to 5 minutes sometime around March 6. The analysis, published as a GitHub issue by user seanGSISG, shows the shift happening almost overnight. From February 1 through March 5, cache writes consistently used the 1-hour tier. Then on March 6, 5-minute tokens started showing up again. By March 8, they accounted for 83% of cache creation. The data comes from two separate machines on different accounts, and both show the exact same pattern at the exact same dates.

The financial hit is real. The analysis applies Anthropic's own pricing and calculates roughly $949 in overpayment on Sonnet calls alone, or about 17% across the full January-to-April window. March alone saw 26% waste. The math is straightforward: shorter TTL means caches expire faster, forcing more frequent re-creation, which means more tokens billed at the higher cache-write rate instead of the cheaper cache-read rate. February, when the 1-hour default was active, showed just 1.1% waste. That's your control group.

What makes this frustrating for users is the silence. No announcement, no changelog entry, no heads-up. People paying for Pro Max subscriptions started hitting quota limits far faster than expected, and some reported burning through their 5x allocation in under 90 minutes of moderate use. The issue thread ties this directly to the TTL regression: when caches expire every 5 minutes instead of every hour, you're re-sending the same context constantly, and those re-created tokens appear to count against rate limits at full weight. Anthropic hasn't responded publicly to the issue as of publication.