Liquid AI just dropped LFM2-24B-A2B, a Mixture of Experts model that packs 24 billion total parameters but only activates 2.3 billion per token. It fits in 32GB of RAM and runs on consumer laptops with integrated GPUs or NPUs. The model has day-one support for llama.cpp, vLLM, and SGLang, and it's available as open weights on Hugging Face. According to Liquid AI's benchmarks, it hits competitive throughput against Qwen3-30B-A3B and gpt-oss-20b, reaching roughly 26,800 total tokens per second on a single H100 with vLLM at 1,024 concurrent requests.
The trick is in how LFM2 scales. They went from 24 to 40 layers and doubled the experts from 32 to 64 per MoE block. Each expert got slightly narrower to stay within the 2B active parameter budget. Total parameters tripled from 8.3B to 24B compared to their earlier LFM2-8B-A1B, but the active path only grew 1.5x.
That ratio is the whole point. You get a bigger model's knowledge without paying for it at inference time. This is the MoE pitch everyone's been making, but Liquid AI is actually shipping it at a scale that fits in a laptop. For developers who can't or won't rent H100s, this changes the math on what's possible locally.
This is still an early checkpoint. The model has trained on 17 trillion tokens so far, and pre-training is ongoing. Liquid AI says to expect a LFM2.5-24B-A2B with additional post-training and reinforcement learning once pre-training completes. Community feedback on Hacker News reports 20-30 tokens per second on CPU-only setups with DDR4 RAM, though users with dedicated GPUs exceeding 4GB VRAM might prefer alternatives like Gemma4 or Qwen3.6. The LFM2 family has crossed 10 million downloads on Hugging Face, suggesting real appetite for models that trade raw parameter count for lean inference.