Liquid AI's LocalCowork reaches 80% tool accuracy at 390ms latency on Apple M4 Max. Google's Gemma 3 27B hits 94% — but takes 24,088ms to get there. That 60x speed gap is the central argument behind LocalCowork, an open-source desktop AI agent Liquid AI published this week that runs entirely on-device, no cloud APIs required.
Built on Tauri 2.0 with a React/TypeScript frontend, the agent ships with 75 tools distributed across 14 Model Context Protocol (MCP) servers: filesystem operations, document processing, OCR, security scanning, email drafting, and calendar management. The project lives in Liquid AI's public cookbook repository and has picked up around 1,500 GitHub stars since release.
The technical core is LFM2-24B-A2B, a sparse mixture-of-experts model that activates only 2 billion of its 24 billion parameters per forward pass. Liquid AI's case is blunt: an agent with a 24-second tool dispatch latency is a demo, not a product. The working demo suite is deliberately narrower than the full tool set — 20 tools across 6 servers, all pre-qualified at 80% or higher single-step accuracy — to ensure multi-step workflows don't collapse.
For the full 75-tool set, LocalCowork uses a dual-model orchestrator: a fine-tuned 1.2B parameter router handles tool selection and dispatch while LFM2-24B-A2B handles planning. Splitting those two roles is a practical response to the context window and latency pressure that comes with large tool registries, and it's a design choice appearing in production agent systems outside Liquid AI as well. Every execution gets written to a local audit trail, which reinforces the privacy guarantees that justify running on-device in the first place.
Liquid AI also documented 12 failure modes during development. The most consistent one — cross-server tool transitions — tripped up every model they tested, not just LFM2. That kind of honest accounting is rare in agent launch posts, and it makes LocalCowork more useful as a reference architecture for developers building their own systems.
MCP has become the default standard for agent-tool integration over the past year, with <a href="/news/2026-03-14-iris-open-source-mcp-native-eval-observability-tool-for-ai-agents">evaluation frameworks designed natively for MCP</a> helping teams ensure quality. The competition to run capable models on consumer hardware is real — <a href="/news/2026-03-14-runanywhere-launches-rcli-on-device-voice-ai-with-proprietary-metalrt-inference">several teams are hitting sub-200ms latency</a> through different architectural approaches. Sparse MoE architectures have a structural advantage in that environment: lower activation cost per inference means faster responses without sacrificing model scale. Whether LFM2-24B-A2B's accuracy holds up on longer, messier real-world task chains is something the 1,500 developers who've already starred the repo will find out before Liquid AI does.