Google shipped something genuinely useful: a way to run Gemini Nano directly in Chrome. The Prompt API, which entered origin trial in Chrome 138, lets developers send natural language requests to an LLM that lives entirely in the browser. No API keys. No cloud calls. No data leaves the device after the initial model download. The use cases are what you'd expect, from AI-powered search on page content to personalized news feeds and content filtering. One developer suggested building a "de-snarkifier" that rewrites aggressive social media posts into neutral or absurd language. That's the kind of thing that only works when inference is free and local.
But the model download is enormous. Developers report it's orders of magnitude larger than Chrome itself, which creates a terrible first experience. You can't do anything until that download finishes. The hardware requirements are steep: at least 22GB of free storage, a GPU with more than 4GB VRAM, or a CPU with 16GB RAM and 4+ cores. If your free storage drops below 10GB after the download, Chrome deletes the model entirely. This isn't running on your average laptop.
Apple keeps its AI capabilities locked to OS-level features through Apple Intelligence and the Neural Engine, not exposing them as a public web API. Microsoft is pushing WebAI and ONNX Runtime Web, using WebGPU to run models like Phi-3 in the browser, while also leaning on Windows NPUs through the Copilot Runtime. None of these approaches are standardized. Mozilla has publicly criticized Google's standardization efforts for the writing assistance APIs, questioning whether this is the right path for the web platform.
For anyone building AI agents, the Prompt API matters because it removes the dependency on external API calls for basic LLM tasks. If you're building a browser extension that needs to classify text or extract structured data, you can do that locally without a backend. But the practical limitations are real. Until operating systems ship with pre-installed models that browsers can tap into, the download problem persists. And the hardware ceiling means this won't reach most users anytime soon.