DeepMind now treats its own AI agents as insider threats

Google DeepMind has released an AI Control Roadmap that stops treating model alignment as the last line of defence and starts treating its own agents as potential insider threats. Published on 18 June by Rohin Shah and Four Flynn, it borrows the MITRE ATT&CK playbook used for rogue employees and points it at AI.

The framework runs trusted AI "supervisors" that watch a working agent's reasoning and can block an action before it executes. DeepMind says it has already analysed a million coding-agent trajectories this way, and the finding is the interesting part: most flagged events came from misinterpretation or overeagerness to finish the user's task, not adversarial intent. One live monitor on its Gemini Spark agent catches things like unintentional data deletion.

It is a notable admission from a frontier lab that alignment training alone will not hold, and a hint at where enterprise agent security is heading: containment, monitoring and a kill switch, not just a better-behaved model.