AMD's Lemonade: Local LLM Server That Actually Works on Radeon

AMD has released Lemonade, an open-source local AI server that runs text, image, and speech models on both GPU and NPU hardware. The project targets a real pain point for AMD hardware owners: the historically messy driver and dependency setup required to run local models on Radeon cards. With a 2MB native C++ backend and one-minute installer, it aims to make local AI actually usable on AMD systems rather than leaving NVIDIA as the only practical option.

Lemonade sits somewhere between Ollama and LM Studio, but with stronger multi-modal orchestration built in. One unified API endpoint handles chat, vision, image generation, transcription, and speech synthesis. It supports multiple inference engines including llama.cpp, Ryzen AI SW, and FastFlowLM, giving users flexibility depending on their hardware. The OpenAI API compatibility means it works with hundreds of existing apps out of the box, from Open WebUI to n8n workflows.

The more interesting capability is Lemonade's experimental distributed inference, which spreads model workloads across multiple systems to run larger models than a single GPU's VRAM would allow. For those needing to scale beyond a single device, shared GPU cohorts present an alternative infrastructure model for running massive models. This matters for AMD's unified memory architecture, where systems with up to 128GB of RAM can load substantial models like gpt-oss-120b. Whether NPU usage delivers meaningful throughput gains beyond small models remains an open question. But the project gives AMD owners something they've lacked: an officially supported, opinionated solution for local AI that just works.