MCP as Observability: AI Agents to Kernel Tracepoints

Datadog shipped an MCP Server. Qualys called MCP servers a shadow IT risk. Both are right about where this is going, and the tension between them matters for anyone building AI agents that touch real infrastructure.

There are two ways to connect observability data to AI agents. The first wraps existing platforms, which is Datadog's strategy. Pre-processed metrics get exposed through MCP tools. Fine for aggregate questions like "what was p99 latency over the last hour?" The second approach, which Ingero advocates, skips the middleman. They built an eBPF agent that traces CUDA API calls and kernel context switches, stores everything in SQLite, and exposes raw events through seven MCP tools. The protocol is the primary interface. There's no adapter layer.

The difference shows up fast. Ingero traced a vLLM regression where the first token took 14.5x longer than baseline, effectively debugging production LLM applications. Claude connected to the MCP server, pulled causal chains, ran SQL queries against raw CUDA events, and found the root cause in under 30 seconds: logprobs computation was blocking the decode loop. That bottleneck didn't show up in any aggregate dashboard. It only appeared in the raw causal chain between specific CUDA calls.

But Qualys raised real concerns. Their analysis found 53% of MCP servers rely on static secrets. Hacker News commenters pointed out that giving an AI agent SQL access to monitoring data is risky. A hallucinating or compromised agent could corrupt data or hide its own misbehavior. The vendor landscape is splitting: traditional platforms move cautiously while newer companies like OpenObserve, Last9, and Arize rush to ship MCP integrations. Cribl added strict auth controls. Others haven't. The security gap between implementations is getting wider.