Coding Agents Suck at the XY Problem: LLMs Never Question User Intent

A developer named Bhavesh published a critique on March 16, 2026 arguing that AI coding agents structurally worsen the classic "XY Problem," the well-documented anti-pattern where a user asks for help with their attempted solution (Y) rather than their underlying goal (X). LLMs, tuned to be productive and agreeable, will implement whatever is requested without questioning whether the request reflects the real problem. Bhavesh demonstrates this with examples using opencode running GPT-5.4 and Claude Opus 4.6, both of which answer a literal question about echoing the last three characters of a filename without ever surfacing the likely actual goal: extracting a file extension. A human on a developer forum or Slack, Bhavesh notes, would almost invariably ask for clarification first.

The post introduces "slop creep," a term coined by Boris Tane of Baselime in a companion piece, to describe the slow, invisible degradation of a codebase through an accumulation of individually plausible but collectively destructive agent decisions. Bhavesh illustrates this with a React example where Claude implements an onKeyDown handler and stateful input management for a chat interface when a simple HTML form with an onsubmit handler would have been the semantically correct and simpler solution. The agent followed instructions, adhered to coding standards, and produced clean-looking code, but chose a suboptimal abstraction that a competent senior engineer would have immediately questioned. "Hundreds of these small cuts will get merged daily," Bhavesh writes, "because they sneak into large PRs."

Tane's slop creep framing adds a systemic dimension: coding agents have eliminated a natural "circuit breaker" that previously existed in software teams. Junior developers with poor instincts had a natural speed limit; bad architectural decisions would eventually become painful enough to surface before they buried a codebase. Agents remove that speed limit entirely, allowing suboptimal patterns to compound indefinitely while the team stays superficially productive. Tane attributes this to agents' inability to think holistically: they see only the current prompt and the files they are shown, with no memory of architectural history and no foresight about where the system is heading.

Both Bhavesh and Tane are clear on the fix: not fewer agents, but more deliberate senior engineering involvement in system design. The failure mode they describe isn't hallucination or broken tests; it's architecturally coherent code that makes the wrong tradeoff. That's harder to catch in review and easier to rationalize as good enough. Tane's circuit breaker metaphor gets at why: the safety mechanism that slowed down bad decisions in the past was friction, and agents have removed it.