Debugging production LLM apps means scrolling through endless traces, looking for patterns. Relvy automates on-call runbooks because nobody updates your wiki. A team of ex-Kubernetes maintainers built Kelet to do that automatically.

The service ingests traces from your agents, clusters failure patterns across sessions, and identifies root causes with evidence attached. It then generates prompt patches with before/after reliability measurements so you can verify the fix actually worked.

Integration takes about five minutes via pip or npm. Traces flow to Kelet's SOC 2-certified servers, and the company covers LLM token costs for analysis. The service works with OpenTelemetry, LangChain, CrewAI, AutoGen, LlamaIndex, and direct APIs from OpenAI, Anthropic, and Gemini. Kelet explicitly positions itself against observability tools like Langfuse and Datadog. Those show you traces. Kelet reads them for you.

Skepticism on Hacker News is warranted. An RCA agent lacks context about your proprietary code. Validating AI-generated hypotheses isn't trivial, unlike Imbue's 100-agent testing swarm. True causal analysis in complex codebases is hard, and developers are right to question whether this works better than simpler "what changed?" approaches. Kelet claims teams see failure patterns with as few as 200 sessions, but that's based on their own design partners. Independent validation doesn't exist yet.

The founding team's background matters here. These are infrastructure veterans with 15 years of open-source systems work, not AI researchers chasing agent hype. They built Kelet because they got frustrated scrolling through traces manually. According to their data, 73% of pilot teams had failures nobody noticed until Kelet flagged them. The median time from trace ingestion to prompt patch was 14.3 minutes. That's a meaningful number if it holds up at scale.