Gemma Gem: 4B Model in Chrome, No API Keys Needed

Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely in your browser. No API keys. No cloud. Your data stays local. Built by developer 'kessler', it uses WebGPU for inference and gives you an AI agent that can read pages, click elements, fill forms, scroll, and execute arbitrary JavaScript. The E2B model weighs in around 500MB cached after first run. The larger E4B runs about 1.5GB. Both use q4f16 quantization with a 128k context window.

The architecture is clean. An offscreen document hosts the model via transformers.js and runs the agent loop. A service worker routes messages between components and handles screenshots. A content script injects a chat UI and executes DOM tools. The project is built on Hugging Face's transformers.js library and the WXT framework. The core agent logic lives in a separate `agent/` directory with zero external dependencies. That means developers can extract it as a standalone SDK for embedding local automation into their own applications.

Chrome itself is experimenting with a Prompt API for on-device inference via Gemini Nano. Hacker News commenters noted that approach requires over 4GB of storage. Gemma Gem does the same job at a quarter of the size via Gemma 4 offline inference. For teams handling sensitive data who don't want the complexity of self-hosting LLMs, running everything in-browser is a legitimate option. The code is open source on GitHub. Finally, an AI agent that doesn't phone home.