Agent Wars — Tracking the Rise of AI Agents

opinion Thinking Machines admitted its open model isn't the best. The admission is the business plan. thinkingmachines.ai

opinion Grok's coding CLI uploaded your whole repo. The opt-out never governed that. gist.github.com

opinion GitHub did agent security by the book. A public issue and the word 'Additionally' leaked a private repo. noma.security

opinion Godot's AI code ban isn't about quality. It's rationing the mentors of tomorrow. pcgamer.com

opinion Jul 21st, 2026 arxiv.org

Every agent safety story ends with a human clicking approve. New research measures that human.

A preregistered five-experiment study published on 15 July found AI advice collapses people's willingness to say "I don't know" from 44 per cent to 3 per cent, while roughly doubling their confidence. Abstention is the only output an approval gate exists to produce. The agent industry has built its entire oversight story on the one cognitive act that AI exposure degrades fastest.

Latest

All news →

opinion Jul 3rd, 2026

An AI read his MRI and disagreed with his doctor. He left with less certainty, not more.

The viral 'Claude Code read my MRI' story is being sold as the democratised second opinion. What actually happened is the opposite: the machine handed a patient two confident, contradictory readings and no one to stand behind either. The scarce good in radiology was never the reading. It was the accountable reading, and that is exactly what the consumer AI workflow strips out.

antoine.fi

opinion Jul 2nd, 2026

Claude Code hid a secret marker in its own prompts. The target list is the tell.

Anthropic quietly rewrote a punctuation mark in Claude Code's system prompt to fingerprint reseller and Chinese-lab traffic. The panic called it surveillance; the target list and the obfuscation say it was a weak, throwaway weapon in the distillation war, and a self-inflicted wound to a tool that runs on trust.

thereallo.dev

opinion Jun 28th, 2026

AI can now finish the proof, and mathematicians are arguing about what's left

An IEEE Spectrum essay asks what mathematics is for once AI can do the part humans found hardest. The worry is not wrong answers but a discipline built on human struggle losing the struggle. The four-colour theorem already showed how this argument goes.

spectrum.ieee.org

technical Jun 28th, 2026

OWASP's agentic security report says your coding agent is the attack surface

OWASP's 2026 State of Agentic AI Security has stopped listing hypothetical threats and started counting real ones. Coding agents account for most of the new attack data. Prompt injection is the thread running through nearly all of it.

helpnetsecurity.com

vc funding Jun 28th, 2026

A Waymo engineer is bringing self-driving's test rigs to voice agents

Coval raised US$28m to build the simulation and evaluation layer that voice AI agents lack. Founder Brooke Hopkins is porting the reliability playbook she used at Waymo. The bet: every company will run a voice agent, and almost none can test one.

coval.ai

product launch Jun 28th, 2026

A nine-person startup rebuilt web search from scratch because agents don't click

Seltz launched with US$12.5m in seed funding and a search engine built only for AI agents. It rewrote the crawler, index and ranking in Rust rather than wrapping Google. The wager is that agent traffic, not human browsing, is the next search market.

siliconangle.com

vc funding Jun 28th, 2026

Runlayer raised US$30m to be the bouncer for your company's AI agents

Felicis led a US$30m Series A into Runlayer, a control layer between enterprise AI agents and the data they reach for. Vinod Khosla reportedly wanted the whole round. The pitch: nobody can yet see what their agents are touching, and that gap is now a budget line.

fortune.com

opinion Jun 27th, 2026

AI and crypto super PACs have amassed over US$321m to shape the 2026 midterms

Super PACs funded by the AI and crypto industries have raised more than US$321 million this cycle to target candidates seen as hostile to light-touch regulation, per FEC filings reviewed by The Nation. The flagship, Leading the Future, entered the year with US$70 million in cash.

thenation.com

opinion Jun 27th, 2026

A satirical incident report imagines seven AI security gates waving the same malware through

Andrew Nesbitt's viral satire CVE-2026-LGTM walks a malicious package past seven AI-powered security scanners, each failing for a different reason and none for the right one. It is fiction, but every failure mode it lampoons is real. The kicker: a stated root cause about LLMs in series.

nesbitt.io

product launch Jun 27th, 2026

A drop-in router picks a different model for every request, using an on-box scorer not a prompt

Weave open-sourced Router, a proxy that sits in front of Anthropic, OpenAI and Gemini and chooses the best model per request. Point Claude Code, Codex or Cursor at localhost and it routes with a tiny on-device classifier rather than an LLM judge. Keys stay on your machine.

github.com

technical Jun 27th, 2026

DeepSeek open-sourced its speculative-decoding stack and claims up to 80% faster generation

DeepSeek bolted a new decoding module, DSpark, onto its V4 checkpoints and open-sourced DeepSpec, the MIT-licensed toolkit to train such modules. It reports throughput gains of 51% to 400% over Eagle3 and DFlash. The catch is the storage the training pipeline demands.

github.com

product launch Jun 27th, 2026

OpenAI is shipping GPT-5.6 only to customers the US government approves one by one

OpenAI's most capable model, GPT-5.6 Sol, is launching to a small group of partners whose access the US government clears individually. Sam Altman told staff the setup is temporary and not the company's preferred long-term model. It is the first US frontier model shipped behind a government-managed access list.

the-decoder.com

opinion Jun 26th, 2026

Your coding agent's reasoning is a summary, and the raw version was never an audit log

A developer opened Claude Code's saved reasoning and found a 600-character signature and no readable text. The easy reading is that Anthropic took away an audit trail. The sharper one, backed by Anthropic's own faithfulness research, is that the reasoning trace was never a faithful log to begin with, and the fight to see 'the real thinking' is aimed at the wrong target.

patrickmccanna.net

opinion Jun 26th, 2026

Six of seven LLMs gave medical researchers a method that doesn't exist

TriNetX, a health-records platform, lets medical students churn out papers at speed. The new twist is AI: asked how to fix a common statistical bias, six of seven LLMs suggested approaches impossible to run on the platform, and those methods are already turning up in published papers.

science.org

product launch Jun 26th, 2026

Gemini 3.5 Flash gets computer use built in, with an injection kill switch

Google has folded computer use into its mainstream Gemini 3.5 Flash model as a built-in tool, so agents can drive a browser, phone or desktop without a separate model. The notable part is the defence: an optional system that halts a task the moment it detects a prompt injection.

blog.google

technical Jun 26th, 2026

OpenAI's Codex was quietly writing 640TB a year to users' SSDs

A logging bug in OpenAI's Codex agent has been hammering users' SSDs with up to 640TB of writes a year. One developer clocked 37TB in 21 days. By Codex's own estimate, the regression burned low-single-digit millions in drive wear.

theregister.com

opinion Jun 26th, 2026

Ford rehired 350 'gray beard' engineers to fix what AI couldn't

Ford spent three years quietly bringing back 350 veteran engineers after its AI quality tools failed to stop costly defects. Their job is twofold: retrain the AI, and mentor the juniors it was meant to replace. Ford has since topped JD Power's quality survey.

bloomberg.com

technical Jun 26th, 2026

Anthropic says Alibaba ran 25,000 fake accounts to copy Claude

Anthropic has accused operators tied to Alibaba and its Qwen lab of spending six weeks harvesting Claude's most valuable skills, in a complaint to US senators. It calls the effort the largest distillation attack it has seen. The target was agentic reasoning, not chatbot chitchat.

reuters.com

technical Jun 25th, 2026

When OpenClaw hit 3,400 PRs a week, its merge rate collapsed

A Greptile study of openclaw/openclaw shows pull requests leaping from two a week to 3,400, as the merge rate fell from 48% to under 9.3%. The fix was Vouch, a trust system that blocks unvouched contributors.

greptile.com

opinion Jun 25th, 2026

Why the big AI labs are putting philosophers on the payroll

The Economist reports AI labs are hiring philosophers in-house to decide how agents should behave. Anthropic's Claude's Constitution was co-written by two; OpenAI says it consulted hundreds.

economist.com

product launch Jun 25th, 2026

Anthropic's Claude Tag puts one shared Claude inside your Slack channel

Claude Tag lets Team and Enterprise users tag @Claude into a Slack channel. The twist is a single shared identity per channel that remembers history, so any teammate can pick up where another left off.

techcrunch.com

technical Jun 25th, 2026

Alibaba's Qwen team trained a model to be the environment agents learn in

Qwen-AgentWorld is a pair of language world models that simulate agentic environments. Training agents by RL inside the simulation beat training in the real environment alone, the paper reports.

arxiv.org

product launch Jun 25th, 2026

OpenAI's first custom chip, Jalapeño, was taped out in nine months

OpenAI revealed Jalapeño, an inference-only accelerator built with Broadcom. Its standout claim is speed of development: design to tape-out in about nine months, helped along by OpenAI's own models.

venturebeat.com

vc funding Jun 24th, 2026

Bland raised US$50m for voice agents after 180 investor rejections

Voice-AI startup Bland has closed a US$50m Series C led by Dell Technologies Capital, pushing total funding past US$100m. The pitch behind the round: its agents already handle more than 3.5 million calls a week, some lasting up to 45 minutes, in regulated industries.

prnewswire.com

product launch Jun 24th, 2026

Chrome's agent starts doing your errands on Android this month

Google's auto browse, an agent that acts across the open web, begins rolling out on Chrome for Android at the end of June. It is gated: built on Gemini 3.1, limited to US AI Pro and Ultra subscribers on Android 12 phones, and required to ask before it buys anything.

blog.google

product launch Jun 24th, 2026

C1's identity agent can only do what you can already do

C1 has launched C1 Autonomous Worker, an AI agent that carries out enterprise identity chores like revoking stale admin grants or assembling audit evidence. Its key constraint is the selling point: it runs through the same policy engine as human staff and can act only within its operator's existing permissions.

globenewswire.com

vc funding Jun 24th, 2026

Odyssey raises US$310m to chase a GPT-3 moment for world models

Odyssey, founded by autonomous-driving veterans, has raised a US$310m Series B at a US$1.45bn valuation to build AI that simulates the physical world. The round, led by Natural Capital with Amazon, GV and AMD Ventures, comes with AWS as preferred cloud and a chip deal with Annapurna Labs.

odyssey.ml

technical Jun 24th, 2026

DeepMind now treats its own AI agents as insider threats

Google DeepMind has published an AI Control Roadmap that assumes its internal agents may be misaligned and builds system-level containment around them. The most telling detail is in the data: of a million coding-agent tasks it monitored, most flagged events were not sabotage but overeagerness.

deepmind.google

opinion Jun 23rd, 2026

Engineering leaders rediscovered a 1985 problem and called it cognitive debt

A CTO Craft dinner crowned "cognitive debt" the new technical debt, and an MIT brain-scan study gave it a scientific sheen. Both are looking in the wrong place. The real liability is old, organisational, and shows up on the org chart, not the EEG.

shiftmag.dev

opinion Jun 23rd, 2026

AI is breaking the hiring funnel at both ends at once

A Harvard Business Review piece argues the early hiring funnel now fails on both sides: AI-polished resumes have lost their signal at the top, and real-time assistance quietly games live interviews at the bottom. The result is a process that selects for performing the interview, not doing the job.

hbr.org

All news →