Rémi Louf, CEO of dottxt, has laid out a problem that anyone building with open-source LLMs will recognize: tool calling is a mess. The same function call gets encoded three different ways depending on which model family you're using. gpt-oss uses <|channel|> markers. DeepSeek wraps everything in special tokens with triple-backtick JSON. GLM5 uses XML-style
The real cost is maintenance, multiplied across the ecosystem. If you're building an inference engine like vLLM, SGLang, or TensorRT-LLM, you need custom parsers for every model family you want to support. The same goes for grammar engines like Outlines and XGrammar, which need format knowledge during generation to apply constraints correctly. Louf calls this the M×N problem: M implementations times N model families, all reverse-engineering the same format knowledge independently.
Gemma 4 illustrates how bad this gets in practice. Reasoning tokens leak into tool-call arguments. Decoders strip special tokens before parsers can see them. llama.cpp had to abandon its generic parser and build a dedicated implementation just for Gemma 4. These aren't edge cases. They're the predictable result of having no shared contract between model creators and the tools that serve them.
Louf's proposal is straightforward: extract model-specific format knowledge into declarative configuration, the same way the ecosystem converged on shared chat templates through Hugging Face. A model ships with a spec that says "here are my boundary tokens, here's how I serialize arguments, here's where reasoning tokens appear." Any format, any model. Grammar engines and parsers consume that spec instead of reverse-engineering it from scratch. It's unglamorous infrastructure work that makes everything else possible, and it's overdue.