Nvidia's Bryan Catanzaro told Axios that compute costs for his team now sit "far beyond the costs of the employees." Uber's CTO already burned through the company's entire 2026 AI budget on token costs, according to The Information. We've reached the strange point where AI can cost more than the people it's supposed to replace.

Bold claim from Swan AI CEO Amos Bar-Joseph: scale with "intelligence, not headcount." Anthropic keeps adjusting prices. Tokens drain budgets faster than anyone projected.

An OpenAI investor told Axios that Codex uses tokens more efficiently than Claude Code. When investors compare token efficiency like it's gas mileage, the market has shifted from "build whatever" to "show us returns." The response from cost-conscious teams? Self-hosted open models. Meta's Llama 3 and Mistral's Mixtral offer near-parity with GPT-4 on many tasks without per-token billing.

Companies deploy them using inference engines like vLLM on their own GPU instances. Databricks and Snowflake have built open model hosting into their platforms so you can run inference where your data already sits. Brad Owens from Asymbl told Axios the real question now is "what is the true value of a worker... human or digital?" Right now, the digital one keeps getting pricier.