A GitHub tutorial from Coherence Daddy went viral this week claiming to cut your Claude Code bill by roughly 90%. The idea is straightforward: keep strategic thinking on Claude Pro, but route heavy lifting like lints, refactors, and file batch operations through free open-source models running locally via Ollama routing. Local models take the grunt work. Claude handles the brain work.

The setup relies on a copy-paste prompt that auto-detects your OS and configures a router between Claude Code and Ollama. It's a genuinely useful pattern for anyone burning through their Claude Pro quota faster than they'd like. Terminal-based coding assistants eat context tokens fast, and offloading repetitive tasks to local models makes practical sense. Say you're renaming a variable across 30 files. That's grunt work, perfect for a local model like Qwen. But deciding which architecture pattern fits a new feature? That's where you want Claude's reasoning, and where you should spend your API budget.

But Hacker News commenters spotted problems fast. routing Claude Code through Ollama: 90% savings, zero attribution. The actual routing tool underneath is @musistudio/claude-code-router, an npm package created by a different developer and discussed on HN nine months ago. Coherence Daddy's README didn't credit the original creator. Commenters also pointed out the "cost math" claim has no actual math, and questioned whether tasks like grep-and-replace need an LLM at all when sed exists. The tutorial's prompt also directs users to the Coherence Daddy homepage, suggesting the whole project doubles as brand promotion.

The routing pattern itself is sound. Using local models for mechanical work while reserving expensive API calls for tasks that actually need reasoning is smart resource management. Just maybe get the router from the person who built it.