GitHub Copilot is killing its flat-rate pricing. Starting June 2026, Microsoft will charge users based on actual model usage on token consumption instead of a fixed monthly fee.

Microsoft has been bleeding money. The Wall Street Journal reported back in 2023 that the company was losing over $20 per user per month on average, with some heavy users costing up to $80 monthly, all while charging just $10 to $19. A promotion that went on for three years.

Ed Zitron, who reported on the transition at Where's Your Ed At, frames this as the inevitable end of subsidized AI growth. Companies like OpenAI, Anthropic, and Perplexity have all been letting users burn far more in compute costs than their subscriptions cover. Anthropic reportedly let users consume up to $8 in compute for every dollar paid.

Token costs vary wildly. A quick chat question costs pennies. A multi-hour autonomous coding session costs serious money. AI now costs more than the humans it replaces. Charging the same flat fee for both doesn't work, especially as agentic usage becomes the default.

Not everyone agrees with Zitron's doom framing. Hacker News commenters pointed out that frontier AI labs like OpenAI and Anthropic actually operate with healthy margins on raw token sales, estimated at 80% or more. The real squeeze hits companies like Cursor and Perplexity that don't own their models and pay full API rates to those frontier providers. At $4 per million tokens, the math works for model builders. Everyone downstream pays retail.

There's an escape hatch. Open-source models like Meta's Llama 3 and Mistral's Mixtral have reached rough parity with proprietary options on many benchmarks. Self-hosting through tools like Ollama or vLLM lets organizations pay fixed hardware costs instead of variable token fees. It requires technical overhead, GPU management, and upfront capital. But amortizing hardware costs across millions of tokens undercuts any usage-based API pricing.