Qwen3.6-35B on my laptop drew a better pelican than Claude Opus 4.7

Simon Willison compares the SVG generation capabilities of two newly released models: Qwen3.6-35B-A3B (running locally via LM Studio) and Claude Opus 4.7 (Anthropic's proprietary model). Using his 'pelican riding a bicycle' benchmark and a backup 'flamingo riding a unicycle' test, he finds the locally-running Qwen model produces better illustrations. However, HN comments note that Opus still significantly outperforms Qwen on coding tasks (95/98 vs 11/98 on Power Ranking), suggesting the comparison is task-specific rather than indicative of overall model capability.

Simon Willison's 'pelican riding a bicycle' SVG benchmark just produced a result nobody expected. Alibaba's Qwen3.6-35B-A3B, running locally on his MacBook Pro M5 through LM Studio, drew a better pelican than Anthropic's latest. Opus couldn't even get the bicycle frame right. Willison tested a backup prompt too, 'flamingo riding a unicycle,' and gave that one to Qwen as well, partly because the model slipped a sunglasses comment into its SVG code.

Willison is the first to admit his pelican benchmark has always been a joke. A funny one, and historically it correlated with overall model quality. That correlation is now broken. He doesn't think a 21GB quantized model running on a laptop is more useful than coding benchmarks for general tasks. The numbers agree. On coding benchmarks, Qwen 3.6 35B solves 11 out of 98 tasks. Opus 4.7 nails 95 out of 98. Not close.