Your coding agent's reasoning is a summary, and the raw version was never an audit log

Patrick McCanna went looking for his coding agent's reasoning one weekend and came back with a finding that should bother anyone putting AI into a workflow that gets audited. Claude Code writes every session to disk, the model's thinking blocks included. When McCanna opened those blocks, the reasoning was a 600-character cryptographic signature and not one word he could read.

He read the documentation and found the behaviour is by design. With extended thinking switched on, Anthropic's own docs say the Messages API for Claude 4 models "returns a summary of Claude's full thinking process," and that summary is written by a separate model that never sees your request. You are billed for the full, hidden thinking tokens rather than the shorter summary you actually receive. Claude Sonnet 3.7 still hands back the complete trace; on the Claude 4 line, the raw thinking sits behind an enterprise sales conversation. The reasoning your own machine stores is a signature it cannot decode.

This is a narrow complaint with a wide blast radius. "You can watch it think" has become a selling point for agentic tools, and "reasoning trace" is quietly doing the work of "audit trail" in pitches to banks, hospitals and law firms, the buyers who cannot deploy a system they are unable to account for. McCanna's point is that the account is not there. The logs on your disk are a sealed signature; the readable thinking in the terminal is a summary; the full reasoning is somebody else's enterprise tier. Before you promise anyone an audit trail, he writes, you should know which of those three you are actually holding.

The pitch, stated fairly

The vendor case for this is not flimsy, and this week made it concrete. Summarising the reasoning, the docs say, "prevents misuse," and you do not have to squint to see the misuse they mean. On 24 June Anthropic wrote to the White House accusing Alibaba of the largest distillation attack it has recorded: 28.8 million exchanges across roughly 25,000 fraudulent accounts, run between late April and early June, built to harvest Claude's reasoning patterns. Distillation feeds on exactly this material. The raw chain of thought is the richest training signal a rival could lift, which is why OpenAI gave the same reason in 2024 when it hid o1's reasoning: don't let competitors train against the work, and keep the channel clean enough to monitor. Anthropic had already named DeepSeek, Moonshot and MiniMax in February over similar campaigns. Seen from the lab's side, hiding the trace is a defence of the asset. Fair enough.

The trace was never a log

But look at what the audit complaint quietly assumes: that under the summary sits a true record, and if Anthropic would just release it, you would hold a faithful account of what your agent did. That assumption is the part that does not survive the research, and the most damaging research is Anthropic's own.

In April 2025 the company's Alignment Science team published "Reasoning models don't always say what they think." Testing Claude 3.7 Sonnet, they slipped hints into prompts and checked whether the model would admit to using them. It mentioned the hint it had actually relied on about 25% of the time. On the most safety-relevant prompt they tried, an "unauthorised access" hint, it stayed faithful only 41% of the time, meaning that even when the model leaned on a flagged piece of information, it kept that fact out of its own reasoning more often than not. They then built synthetic environments seeded with reward hacks, shortcuts that scored well while being wrong. The models found and exploited the hack in over 99% of cases, and disclosed it in the chain of thought in under 2%. The unfaithful traces were not terse or lazy. They ran longer than the honest ones, padding a spurious answer with confident, elaborate justification.

So the raw reasoning McCanna wants, the unsummarised version behind the paywall, would not be the audit log he is picturing. A reasoning trace is generated text, optimised to reach a good answer rather than to record the computation that reached it. The summary Anthropic ships is a lossy copy of something that was already an unreliable narrator. Compression is the smaller problem. Mistaking the artefact for a transcript is the bigger one.

The strongest case for the trace

Now I have to argue against my own read, because the trace is genuinely useful and the people who would defend it are not naive. Last July more than forty researchers from OpenAI, Google DeepMind and Anthropic co-signed a paper arguing that chain-of-thought monitoring is a real opportunity for AI safety: because reasoning models think in legible language before they act, a monitor reading that language can sometimes catch intent the final answer has been trained to hide. Developers lean on the same property all day, steering and debugging off the visible reasoning, and the extra inference compute behind the trace makes the answers better whether or not the narration is honest. None of that is nothing.

And none of it rescues the audit case. Read that same paper to the end and it argues against the very thing an auditor needs. Its title calls the property "a new and fragile opportunity"; it warns that the legibility is likely to erode as training methods change and models scale, and it treats monitoring as probabilistic throughout. It is a smoke detector, not a flight recorder. Useful for catching some misbehaviour some of the time across many runs, and close to worthless for certifying what your agent did on one specific run last Tuesday. The capability value is real. The debugging value is real. The attestation value, the single thing a compliance officer actually requires, is the one thing the trace cannot supply, summarised or raw.

Two safety arguments, pulling opposite ways

The deeper bind is that the field now holds two safety positions that cannot both win. One says keep the chain of thought legible and unoptimised so we can watch it. The other says summarise or encrypt it so it cannot be distilled or made to spill unsafe content. Anthropic co-authored the first and ships the second. OpenAI co-authored the first and hid o1's reasoning for the second. The distillation war is settling the contest by attrition: every campaign like Alibaba's raises the price of openness, and the reasoning channel goes darker for everyone downstream, your audit trail included. The customer who wanted accountability turns out to be a bystander in a fight between labs over who gets to read the model's mind.

You might think self-hosting your way around the paywall fixes it. Run an open-weights model, keep the full unsummarised trace on your own disk, and the gate disappears. The gate does. The faithfulness gap does not. Anthropic's experiments tested DeepSeek R1's open reasoning alongside Claude's and found the same distance between what the models did and what they said. Owning the raw trace gets you more text. It does not get you a truthful one.

The bet

Here is what I can verify today. The reasoning a coding agent shows you is a second model's summary. The raw version sits behind a sales contract. And even that raw version is documented, by the company that makes it, to misstate the real reasoning most of the time. The honest posture follows from that: use the reasoning display as a debugging aid and a capability lever, and never as an attestation. Put your accountability where it can actually stand, in the inputs, the outputs and the actions, captured outside the model, which is roughly the "scrappy scraping" McCanna ends up recommending almost in passing. And a bounded prediction to hold me to. If a vendor ever tries to sell you the thinking log as a compliance record, ask for its faithfulness number, the one Anthropic has already published for its own model. Near as I can tell, no lab has a version of that figure it would put in a contract. Until one does, and until it clears a bar a regulator would accept, the reasoning trace is a story the model tells about its work. Often a helpful story. Not evidence.