Self-Evolving Skill Pattern for Claude Code: Five-Gate Knowledge Governance with Confidence Decay

A developer known as tiansenxu has published v3 of a design pattern called the Self-Evolving Skill, built on top of Anthropic's Claude Code Skills platform. The pattern addresses a fundamental limitation of static AI coding tools: every new session discards domain knowledge accumulated in previous ones. The solution is a living references/ directory alongside the skill that persists structured knowledge across sessions, governed by a Five-Gate protocol covering Value, Alignment, Redundancy, Freshness, and Placement. The pattern has now completed three validation rounds against a 29-table smart building management database, with v3 passing all 6 of 6 verification points including an integrity test in which Gate 2 successfully rejected incorrect enum values asserted by a human user, because SQL-verified contradicting data already existed in the knowledge base.

The confidence decay model — C(t) = C₀ × e^(−λ_base × (β+1)/(α+1) × t) — is the mathematical engine behind the Freshness gate, using Bayesian feedback parameters to slow or accelerate how quickly stored knowledge degrades toward a revalidation threshold. A key architectural decision was moving all decay arithmetic out of the SKILL.md prompt layer and into a dedicated Python toolchain (formulas.py, models.py, decay_engine.py), with 143 pytest cases now passing. The author reports a 63.6% rejection rate in initial experiments, reflecting a deliberate design philosophy that treats rejection, not accumulation, as the protocol's primary function. The pattern is classified in the academic self-evolving agents literature as an instance of Inter-test-time Context Evolution with Text-Feedback Governance, following the taxonomy in Gao et al.'s 77-page survey published in Transactions on Machine Learning Research in January 2026.

The most consequential open risk the author identifies is also the most structurally awkward one: the Five-Gate governance protocol has no mechanical enforcement. Every gate decision is a prompt-layer judgment executed by the LLM under instruction-following, not a programmatic invariant or schema validation. The Python toolchain enforces arithmetic correctness for the decay model, but whether Claude follows the gates at all is a matter of model discipline. The author explicitly lists this as an outstanding open question — a level of candor that, in similar projects reviewed for this article, is rare. A related structural issue is the α/β upper bound problem embedded in the decay formula: knowledge entries that accumulate many positive feedbacks see their effective decay rate approach zero, creating a ratchet where entrenched stale knowledge and entrenched correct knowledge become mathematically indistinguishable. λ calibration across domains remains pending, with the author noting that production data ethics clearance is required before real-world decay behavior can be measured at scale.

For practitioners tracking the agent tooling space, the pattern sits between fully static skills and the complexity of continual learning systems — deliberately so, scoped to domains where knowledge has a natural ceiling, such as database investigation, codebase analysis, and business system integration, rather than open-ended knowledge expansion. The routing table architecture, which loads only 1–2 relevant topic files per session rather than the full knowledge base, is a notable concession to context window constraints inherent to the Claude Code Skills prompt injection layer. The source repository is available at github.com/191341025/Self-Evolving-Skill under a CC BY-SA 4.0 license, and an interactive visualization of the decay formula is published alongside the documentation.