Centurion: K8s-Style Resource Scheduler for AI Coding Agent Fleets

A new open-source framework called Centurion brings Kubernetes-style resource scheduling to AI coding agent fleets, filling a gap that Anthropic has explicitly declined to address in Claude Code. Released by GitHub user spacelobster88, Centurion operates at the OS and infrastructure layer, providing hardware-aware admission control, three-level memory pressure detection, and auto-scaling via a companion component called Optio. The project was created in direct response to Anthropic closing GitHub issue #15487 — a community request for a maxParallelAgents setting — as NOT_PLANNED, with Anthropic explicitly placing resource orchestration outside Claude Code's application scope. At least three additional Claude Code issues documenting OOM-kill scenarios from unmanaged parallel agents confirm the underlying problem is widespread.

Centurion's architecture mirrors the Kubernetes model closely, including a Roman military naming hierarchy: Legion (deployment group), Century (agent squad), and Legionary (individual agent instance, analogous to a Kubernetes Pod). The framework ships with several interlocking components: Harness Loop handles DAG-based project-level task orchestration with cross-session resume via SQLite and file state; Aquilifer provides real-time WebSocket event streaming for live agent status; and Optio monitors queue depth every 10 seconds to dynamically adjust fleet size. The system exposes 21 REST endpoints and 19 MCP tools for Claude Code integration, supports Google's Agent-to-Agent (A2A) protocol, and carries an OpenClaw compatibility badge — signaling alignment with an emerging standards layer for agent fleet interfaces. A companion bootstrapper called Auspex sets up the full stack on Apple Silicon in a single command.

Centurion works with Claude, GPT, Gemini, or plain shell scripts — model-agnostic in the same way Kubernetes does not care which container runtime sits beneath it. Infrastructure layers operate at the runtime level, not the model level, which also explains why the author's performance claims scale: running 20 or more simultaneous agents on a 16 GB Mac Mini with zero OOM kills, merging eight Rust pull requests in 30 minutes each passing over 7,000 tests, and completing eight parallel research tasks in 34 minutes with zero retries. Whether those figures hold across diverse workloads remains to be independently verified, but the architecture addresses a documented and reproducible class of failure.

Anthropic's NOT_PLANNED designation on issue #15487 functions less as a product rejection and more as an architectural handoff — implicitly announcing that the infrastructure layer above Claude Code is open territory. The Docker/Kubernetes history makes the dynamic legible: Docker kept container runtimes stateless and composable, and that restraint created the surface area for Kubernetes to emerge. Claude Code's deliberate statelessness in headless mode (claude -p) creates an analogous surface area. Centurion is not the only project in this space, but its A2A protocol support, MCP integration, and one-command deployment story make it one of the more complete attempts so far to define what a managed agent runtime actually looks like in practice.