LLM Architecture Gallery: Visual Fact Sheets for 40+ Open-Weight Models

Sebastian Raschka, independent LLM researcher and author of the widely-cited "Build a Large Language Model (From Scratch)" book, has published the LLM Architecture Gallery — a standardised visual reference cataloguing architecture diagrams and fact sheets for over 40 major open-weight language models. Last updated March 14, 2026, the gallery covers models from mid-2024 through early 2026, spanning organizations from Meta, Google DeepMind, and Mistral AI to Chinese labs including DeepSeek, Alibaba's Qwen team, Zhipu AI, MiniMax, and Moonshot AI, as well as newer entrants like India's Sarvam AI. Each entry documents parameter scale, decoder type, attention mechanism, key design decisions, and links directly to the model's config.json and technical report.

The gallery surfaces several clear architectural trends reshaping the open-weight landscape. Sparse Mixture-of-Experts decoders have become the dominant pattern at frontier scale, with DeepSeek V3 (671B total, 37B active), Llama 4 Maverick (400B total, 17B active), and Qwen3 235B-A22B (235B total, 22B active) all following this playbook to keep inference costs tractable despite massive parameter counts. Attention mechanisms have diversified considerably: Multi-head Latent Attention (MLA) appears in DeepSeek's V3 and R1 lines and Moonshot's Kimi K2, QK-Norm has been adopted broadly across OLMo 2, Gemma 3, Qwen3, and MiniMax M2, and hybrid linear-attention designs have emerged in models like Qwen3 Next, which uses a 3:1 Gated DeltaNet and Gated Attention mix. Coverage ranges from 3B-parameter edge models to trillion-parameter designs like Moonshot's Kimi K2 and Ling 2.5 1T.

The gallery is distilled from Raschka's longer-form analyses — "The Big LLM Architecture Comparison" (July 2025) and "A Dream of Spring for Open-Weight LLMs" (February 2026) — and represents a methodological pattern consistent across his work: deep-dive articles first, condensed reference artifact second. Community reception on Hacker News drew immediate comparisons to the Neural Network Zoo created by Fjodor van Veen at the Asimov Institute in 2016, which served as a foundational visual cheat sheet for classical architectures. Commenters also proposed natural extensions, including a family-tree layout showing architectural lineage and a size-scaled view for comparing model footprints at a glance — both gaps that remain unaddressed in the field.

The Neural Network Zoo comparison is telling. Van Veen's one-page cheat sheet became a standard citation in deep learning courses for years after it published — not because it was exhaustive, but because it gave practitioners a shared vocabulary. Raschka's gallery, tied directly to config files and his own "From Scratch" implementations, is positioned to play a similar role for the current generation. Whether it holds that position depends on update cadence; at the rate new open-weight models are shipping, a reference that goes stale within months is more frustrating than no reference at all. The proposed family-tree and size-scaled extensions would make it more durable — and no one has built them yet.