Sebastian Raschka's new technical deep dive strips coding agents down to their actual working parts, and the takeaway is clear: the model isn't the main event. Tools like Claude Code and Codex CLI feel capable not because they use smarter LLMs, but because they wrap those models in what Raschka calls a "coding harness". That's the software layer handling repo context, tool execution, memory, permissions, and all the plumbing that turns a next-token predictor into something that can actually work on your codebase. His Mini Coding Agent, a lightweight Python implementation using Ollama, demonstrates these ideas with workspace snapshotting, approval flows for risky actions, and session resumption.

Raschka identifies six core components worth understanding. Live repo context keeps the agent oriented in your code. Prompt shaping and cache reuse keep costs down. Structured tools with validation and permissions stop the agent from doing something destructive. Then there's context reduction, which prevents the token window from bloating into unusability, plus transcripts and memory so sessions can persist and resume. Finally, delegation to bounded subagents handles complex tasks without everything falling apart. If you've wondered why coding agents like Claude Code feel smarter than Claude in a browser window, this is why.

The community response adds an interesting counterpoint. A framework called Ossature takes a different approach entirely, prioritizing spec-driven generation over chat-based interaction. Instead of burning tokens on long conversational context, Ossature audits specifications for completeness and contradictions before generating code, then executes in constrained steps. The team behind it used this method to generate a CHIP-8 emulator purely from specs, no extended chat required. It's a reminder that "agent" describes a design space, not a single architecture.