DeepSeek V4: almost frontier, a fraction of the price

DeepSeek just shipped its V4 model series, and the pricing is aggressive enough to make OpenAI and Anthropic squirm. The two new models, V4-Pro and V4-Flash, are Mixture-of-Experts architectures with a 1 million token context window, both released under the MIT license. V4-Pro packs 1.6 trillion total parameters with 49 billion active, making it the largest open weights model released to date. V4-Flash is the smaller sibling at 284 billion total and 13 billion active. The number that matters: V4-Flash costs $0.14 per million input tokens, undercutting even GPT-5.4 Nano. V4-Pro runs $1.74 per million input tokens, cheaper than Gemini 3.1 Pro at $2, GPT-5.4 at $2.50, and Claude Sonnet 4.6 at $3.

So how did they get costs this low? Efficiency. DeepSeek says V4-Pro uses only 27% of the compute and 10% of the KV cache compared to their own V3.2 when processing 1 million tokens. Flash pushes that to 10% and 7% respectively. That's a big generational efficiency gain. DeepSeek's own benchmarks put V4-Pro slightly behind GPT-5.4 and Gemini 3.1 Pro, trailing the frontier by roughly three to six months. But early testers report V4-Pro feels on par with GPT-5.4 and Claude Opus 4.6 for real work like frontend development, according to community feedback reviewed by Simon Willison.

There's also the hardware angle. DeepSeek trained these models despite US export controls blocking access to NVIDIA's H100 and A100 GPUs. The lab has been assembling training infrastructure from older chip variants and domestic alternatives since the V3 generation. Producing frontier-adjacent models at a fraction of the cost under those constraints raises real questions about what export controls are actually accomplishing. The open weights release means quantized versions from Unsloth should land soon, potentially making the 160GB Flash model runnable on a 128GB consumer MacBook.