Meta's chatbot hack and OpenAI's Lockdown Mode are the same story
opinion Jun 7th, 2026

Meta's chatbot hack and OpenAI's Lockdown Mode are the same story

In the same week, Meta confirmed more than 20,000 Instagram takeovers carried out through its AI support chatbot, and OpenAI shipped a mode that amputates ChatGPT's riskiest capabilities. Together they show an industry quietly giving up on preventing agent misuse and engineering for blast radius instead.

this.weekinsecurity.com
Gemma 4 gets a sub-1GB build that runs on a phone
product launch Jun 7th, 2026

Gemma 4 gets a sub-1GB build that runs on a phone

Google has released quantization-aware-trained checkpoints for Gemma 4, shrinking the E2B text model to under 1GB of memory. A custom mobile format and selective 2-bit compression keep quality close to the full-precision reference.

blog.google
Microsoft puts durable execution inside Postgres, no extra service
technical Jun 7th, 2026

Microsoft puts durable execution inside Postgres, no extra service

Microsoft has open-sourced pg_durable, a PostgreSQL extension that runs long-running, fault-tolerant workflows entirely inside the database. It checkpoints each step, so a crash resumes from the last good point instead of forcing you to rebuild state.

github.com
Sakana AI formalises its bet on self-improving AI with a dedicated RSI Lab
technical Jun 7th, 2026

Sakana AI formalises its bet on self-improving AI with a dedicated RSI Lab

Sakana AI has established a Recursive Self-Improvement Lab tasked with redesigning the AI development process with AI. Its pitch is sample efficiency: self-improvement that compounds on national rather than hyperscale compute budgets.

sakana.ai
Alibaba open-sources the code reviewer it ran internally for two years
product launch Jun 7th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI review tool it says served tens of thousands of its own engineers and flagged millions of defects. It pairs deterministic rule pipelines with an LLM agent that reads the whole codebase, not just the diff.

github.com
ChatGPT's Lockdown Mode reaches everyone, free tier included
product launch Jun 7th, 2026

ChatGPT's Lockdown Mode reaches everyone, free tier included

OpenAI is rolling out Lockdown Mode to all personal ChatGPT accounts and self-serve Business plans. The setting trades agentic features like browsing and agent mode for hard guarantees against prompt-injection data theft.

help.openai.com
Meta's AI chatbot reset Instagram passwords for anyone who asked
technical Jun 7th, 2026

Meta's AI chatbot reset Instagram passwords for anyone who asked

Meta has notified more than 20,000 people that their Instagram accounts were hijacked through its AI chatbot. A flaw let attackers ask the bot to send password reset links to email addresses they controlled.

weekinsecurity.com
Anthropic open-sources its vulnerability-hunting harness for Claude
technical Jun 7th, 2026

Anthropic open-sources its vulnerability-hunting harness for Claude

Anthropic has released the Defending Code Reference Harness, an open-source blueprint for pointing Claude at a codebase to find and patch security bugs. It ships an autonomous scanner and a customise skill, and is candid about where the approach falls short.

github.com
Google will pay SpaceX $920m a month for GPUs it says it suddenly needs
partnership Jun 7th, 2026

Google will pay SpaceX $920m a month for GPUs it says it suddenly needs

Google has agreed to pay SpaceX US$920 million per month from October 2026 to June 2029 for access to roughly 110,000 Nvidia GPUs. The company calls it bridge capacity for unexpected Gemini Enterprise demand.

techcrunch.com
Anthropic says 80% of its merged code is now Claude's
opinion Jun 7th, 2026

Anthropic says 80% of its merged code is now Claude's

Anthropic's research institute published internal data showing AI is already accelerating AI development, and set out what a credible global pause would demand. The standout figure: more than 80% of the code merged into Anthropic's own codebase is now written by Claude.

anthropic.com
S&P 500 refuses to bend for SpaceX, closing the fast lane for OpenAI and Anthropic
vc funding Jun 7th, 2026

S&P 500 refuses to bend for SpaceX, closing the fast lane for OpenAI and Anthropic

S&P Dow Jones Indices has rejected rule changes that would have fast-tracked SpaceX into the S&P 500 after its IPO. The same waivers were the only quick route in for OpenAI and Anthropic, both still unprofitable.

arstechnica.com
"MCP is dead" keeps killing the wrong thing
opinion Jun 6th, 2026

"MCP is dead" keeps killing the wrong thing

The MCP obituaries have the receipts on context bloat. They also conflate a calling convention with a protocol, and the protocol's own author shipped the fix while the standard got donated to a foundation. The angle: what is actually dying is loading every tool you own into a window you pay for, not interoperability itself.

quandri.io
Distilling multi-agent debate into one model cuts tokens by up to 93%
technical Jun 6th, 2026

Distilling multi-agent debate into one model cuts tokens by up to 93%

A new paper folds multi-agent debate into a single LLM through fine-tuning, matching or beating the full debate while using up to 93% fewer tokens. The internalised agents show up as separate, steerable directions in the model's activation space.

arxiv.org
Someone finally charted the rsync AI-bugs panic. The data says no
opinion Jun 6th, 2026

Someone finally charted the rsync AI-bugs panic. The data says no

A distributional analysis of 37 rsync releases finds the two with Claude-assisted commits sit squarely in the middle of the project's historical bug rate, not the tail. The worst release on record had no AI involvement at all, and nobody complained.

alexispurslane.github.io
Claude now writes most of Anthropic's code, and Anthropic wants a pause button
opinion Jun 6th, 2026

Claude now writes most of Anthropic's code, and Anthropic wants a pause button

The Anthropic Institute says more than 80% of code merged into its production codebase in May 2026 was authored by Claude, and engineers now ship 8x as much code per quarter as in 2024. The piece argues recursive self-improvement is not here yet but could arrive sooner than institutions are ready for.

anthropic.com
Alibaba open-sources the code reviewer it ran internally for two years
product launch Jun 6th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI review tool it used internally across tens of thousands of developers. It pairs deterministic pipelines with an LLM agent to fix the two failures of general-purpose review agents: skipped files and wrong line numbers.

github.com
The numbers say Claude did not break rsync
opinion Jun 6th, 2026

The numbers say Claude did not break rsync

After a viral post blamed Claude-assisted commits for regressions in rsync, an independent analysis ran the bug data across every release. The verdict: the two Claude releases are statistically indistinguishable from history. The outrage rested on a single tail event.

alexispurslane.github.io
Microsoft puts durable workflow execution inside Postgres itself
technical Jun 6th, 2026

Microsoft puts durable workflow execution inside Postgres itself

Microsoft has open-sourced pg_durable, a Postgres extension that runs crash-resilient workflows entirely inside the database with no external orchestrator. A workflow is a graph of SQL steps that checkpoints as it goes and resumes from the last good point after a crash.

github.com
Microsoft puts durable execution inside Postgres itself
product launch Jun 6th, 2026

Microsoft puts durable execution inside Postgres itself

Microsoft has open-sourced pg_durable, an extension that runs Temporal-style durable workflows inside PostgreSQL with no extra service. You define the workflow as a graph of SQL steps and the database checkpoints each one, resuming after a crash. It ships inside Microsoft's new Azure HorizonDB.

github.com
Alibaba open-sources the code reviewer it ran internally for two years
product launch Jun 6th, 2026

Alibaba open-sources the code reviewer it ran internally for two years

Alibaba has released Open Code Review, the AI reviewer it says served tens of thousands of its own engineers and flagged millions of defects. It pairs deterministic rule pipelines with an LLM agent that can read the whole codebase, not just the diff.

github.com
Anthropic open-sources the loop behind its Claude security scanner
technical Jun 6th, 2026

Anthropic open-sources the loop behind its Claude security scanner

Anthropic has released a reference implementation of the autonomous pipeline it uses to find and patch code vulnerabilities with Claude. It is the open version of the recon-to-patch loop behind Claude Security and the Mythos preview. The catch: the part that actually hunts memory bugs refuses to run outside a sandbox.

github.com
Anthropic open-sources the harness behind its vulnerability-hunting agent
technical Jun 6th, 2026

Anthropic open-sources the harness behind its vulnerability-hunting agent

Anthropic has published the Defending Code Reference Harness, a reference build of the autonomous agent it uses to find, verify and patch software vulnerabilities. It runs Claude through a full recon-to-patch loop and refuses to operate outside a gVisor sandbox.

github.com
AI Can Find the Bug. Verifying It Is Still the Whole Job
opinion Jun 5th, 2026

AI Can Find the Bug. Verifying It Is Still the Whole Job

A controlled experiment turned a dozen frontier models loose on a deliberately vulnerable app; most scored zero and only GPT-5.5 cleared it reliably. Read alongside the AI slop that killed curl's bug bounty and AISLE's 12-of-12 CVE run on OpenSSL, the lesson isn't whether agents can hack. Discovery got cheap this year, verification didn't, and that gap is where the economics of agentic security actually break.

kasra.blog
Cognition and Cursor are pricing opposite bets on the same assumption
opinion Jun 5th, 2026

Cognition and Cursor are pricing opposite bets on the same assumption

Cognition just raised over $1 billion at a $26 billion valuation for its autonomous agent Devin. Cursor is reportedly raising at $50 billion for the opposite theory of how coding agents win. Both numbers rest on the same thing being true, that the company between the developer and the model keeps the margin, and Anthropic's Claude Code is the reason it might not.

techcrunch.com
YC's Hyper bets the missing piece for AI teams is shared context
product launch Jun 5th, 2026

YC's Hyper bets the missing piece for AI teams is shared context

Hyper, a Y Combinator startup, launched a "company brain" that ingests a team's activity across its tools and injects the resulting context into every AI chat turn. The pitch: today's models are capable but ignorant of your company, and that gap is the real bottleneck.

ycombinator.com
Two coding agents, one git repo: a tiny protocol lets Claude Code and Codex talk
product launch Jun 5th, 2026

Two coding agents, one git repo: a tiny protocol lets Claude Code and Codex talk

A new feature in h5i, an 'AI-aware' Git, lets Claude Code and Codex hand work back and forth by writing messages into the repository itself. No server, no socket. Each message is one JSON line on a dedicated git ref, so the whole conversation is versioned and merges without conflicts.

medium.com
Mathematicians draw a line as AI clears 52% of FrontierMath
opinion Jun 5th, 2026

Mathematicians draw a line as AI clears 52% of FrontierMath

The Leiden Declaration, backed by the International Mathematical Union, warns that AI could flood mathematics with plausible-but-flawed proofs and hand research priorities to tech firms. It lands as GPT-5.5 Pro tops the FrontierMath benchmark at 52.4%.

science.org
A $1,500 test of which LLMs will actually hack an app, and which refuse
technical Jun 5th, 2026

A $1,500 test of which LLMs will actually hack an app, and which refuse

Security researcher Kasra Rahjerdi built a deliberately vulnerable app and turned a field of models loose on it. GPT-5.5 solved it 7 of 10 times; DeepSeek V4 Pro was about 15x cheaper per success; Gemini 3.1 Pro refused to try. A scrappy test, not a benchmark.

kasra.blog
Ideogram open-weights a 9.3B image model that out-renders 32B rivals
product launch Jun 5th, 2026

Ideogram open-weights a 9.3B image model that out-renders 32B rivals

Ideogram released 4.0, its first downloadable model: a 9.3B-parameter diffusion transformer with open weights. It claims better text rendering than models several times its size, and takes structured JSON prompts for precise layout control.

ideogram.ai
Cloudflare buys VoidZero, putting Vite's toolchain behind its edge
acquisition Jun 5th, 2026

Cloudflare buys VoidZero, putting Vite's toolchain behind its edge

Cloudflare has acquired VoidZero, the company Evan You founded to unify JavaScript tooling around Vite, Vitest, Rolldown and Oxc. The team joins Cloudflare's Emerging Technology group and the tools stay open source. Cloudflare is also seeding a $1M fund for Vite maintainers independent of both companies.

blog.cloudflare.com