News
The latest from the AI agent ecosystem, updated multiple times daily.
Milla Jovovich Built an AI Memory Tool. It's Blowing Up on GitHub.
Milla Jovovich announced MemPalace, an open-source AI memory framework using the ancient 'memory palace' technique. The system organizes information in virtual rooms instead of relying on keyword searches. Jovovich designed the concept while Ben Sigman (CEO of Libre Labs) engineered the software. The project gained 10k GitHub stars in 24 hours.
NetBSD Labels AI Code 'Tainted' as BSD Projects Wrestle With LLM Rules
NetBSD classifies AI-generated code as 'tainted' requiring special approval, while Linux puts the burden on humans who sign commits. Now BSD projects are debating which model makes sense for them.
GLM-5.1 hits Opus 4.6 agent performance at a third the cost
OpenClaw Arena benchmarks show GLM-5.1 matching Opus 4.6 on real agent tasks like web browsing and file operations, but at roughly one-third the cost. Zhipu AI's model narrows the gap with Western competitors for production agent workloads.
Scientists Keep Citing Papers That Don't Exist
A Nature analysis finds tens of thousands of 2025 publications likely contain AI-generated fake references. Studies show 2-6% of papers in computer science conferences included hallucinated citations, with some editors rejecting 25% of submissions due to fabricated references. Publishers are scrambling to build screening tools as the problem grows.
GPT-2 Was 'Too Dangerous.' Everyone Released It Anyway.
In February 2019, OpenAI refused to release the full GPT-2 model, claiming it was too dangerous for public use. The stated fear was fake news, spam, and impersonation at scale. They released only a stripped-down version. Competitors and open-source developers built comparable models within months. The embargo established a pattern OpenAI would repeat: claim unprecedented power, warn of unique dangers, generate headlines, then release when others catch up.
Pi Agent Creator Joins Earendil
Armin Ronacher announces that Mario Zechner is joining Earendil, bringing with him Pi - a quality-focused coding agent and agent infrastructure library. The collaboration combines Pi's deliberate approach with Earendil's vision for Lefos, a machine entity designed for measured communication rather than accelerating low-content production.
One Pixel, Three Bytes, a Working Neural Network
dvelton's ai-pixel trains a binary classifier and stuffs all three parameters into RGB values of a 1x1 PNG. Gradient descent, sigmoid activation, 8-bit quantization. The pixel itself makes predictions when loaded back.
AI scrapers took down acme.com for a month
ACME.com suffered intermittent outages for over a month as LLM scraper bots overwhelmed its HTTPS server with requests to non-existent pages. The fix was closing port 443, but this blocks 10% of legitimate traffic. The incident highlights a broader problem: AI companies' scrapers are overwhelming small sites with no accountability.
Nature: Bigger LLMs Are Getting Worse at Knowing When to Shut Up
A Nature study finds that scaling up and instruction-tuning LLMs creates a new failure mode: models now confidently give wrong answers instead of refusing questions they can't handle. Researchers from Valencian Research Institute for AI and Cambridge analyzed GPT, LLaMA, and BLOOM families, finding that scaled-up models produce 'apparently sensible yet wrong' answers most often on questions where human supervisors also make mistakes.
Skip the Vector DB: Your Folders Are Already a Knowledge Graph
A developer's 52,000-file Obsidian vault shows that wikilinks and folders can replace vector databases for LLM context. An agent automatically creates and links meeting notes using a PARA structure. The result is a context engineering system where pointing an LLM at six months of project history beats cold prompting for drafting design docs.
Ralph: Break Big Coding Projects Into LLM-Friendly Chunks
A practical introduction to Ralph, an AI-powered methodology that breaks software projects into small requirements with acceptance criteria, letting LLMs build applications through an automated loop without human intervention.
LLMs Are Bullshit Machines, Says Engineer They Hallucinated About
Kyle Kingsbury published an essay calling LLMs what many developers think but few say: bullshit machines. The piece catalogs confabulations across Claude, ChatGPT, and Gemini, argues that hallucination is the architecture not a bug, and explores what happens when AI-generated text pollutes shared knowledge at scale.
Muse Spark: fast, smart, can't search the web yet
Meta's new model benchmarks competitively with Opus 4.6 but struggles with basic agent tasks like web search, according to early Hacker News reactions. The tension between raw reasoning power and broken tool use raises questions about whether Muse Spark is ready for autonomous agents or just another clever chatbot.
AMD AI director: Claude Code getting dumber and lazier since update
AMD's AI director Stella Laurenzo filed a GitHub issue reporting that Claude Code's performance has degraded since a February update. Analysis of 6,852 sessions showed increased 'stop-hook violations' (indicating laziness), decreased code reading before making changes, and increased full-file rewrites. The issues correlate with thinking content redaction in version 2.1.69. Laurenzo's team has switched to another provider and urges Anthropic to expose thinking token counts per request so users can verify they're getting adequate reasoning depth.
Skrun frees Agent Skills from Claude Code silo
Skrun is an open-source CLI tool that transforms Agent Skills (SKILL.md) into callable APIs via POST /run endpoints. It supports multi-model backends (Anthropic, OpenAI, Google, Mistral, Groq) with automatic fallback, stateful agents that remember across runs, and tool calling via CLI scripts or MCP servers. Compatible with Claude Code, Copilot, and Codex.
New York Times Duped by Telehealth Scam, Called It AI's Future
Techdirt critically analyzes a New York Times profile of Medvi, an 'AI-powered' telehealth startup that the NYT described as a '$1.8 billion company' run by two brothers. The article debunks this narrative, pointing out that Medvi has no official valuation, faces FDA warning letters and class action lawsuits, and uses deceptive practices including fake AI-generated doctors and patients in ads, deepfaked before-and-after photos, and misleading marketing claims.
One Binary to Replace Kafka, Redis, and RabbitMQ: Inside NATS
A technical walkthrough of NATS, a high-performance messaging system that combines pub/sub, request/reply, and persistence (JetStream) in a single binary. The author explains how NATS can replace Kafka, Redis, and RabbitMQ, covering Core NATS, JetStream, subjects, wildcards, queue groups, and architectural patterns. The article compares NATS's subject-based routing with Kafka's partition model and explains NATS's approach to message delivery and consumer behavior.
Project Glasswing: Anthropic's $100M to Arm Defenders Before Attackers
Anthropic announces Project Glasswing, a collaborative initiative with major tech companies including Amazon, Apple, Google, Microsoft, NVIDIA, and others to use their new frontier model 'Claude Mythos 2 Preview' for cybersecurity defense. The model demonstrates advanced capabilities to autonomously find thousands of high-severity vulnerabilities in major operating systems and web browsers. Anthropic is committing $100M in usage credits and $4M in donations to open-source security organizations to help defenders gain advantage against AI-augmented cyber threats.
Anthropic signs multi-GW TPU deal with Google, Broadcom for 2027
Anthropic signs a new agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity expected to come online in 2027. The company reports run-rate revenue has surpassed $30 billion, with over 1,000 business customers spending over $1 million annually. This partnership builds on existing work with Google Cloud and Broadcom, while Amazon remains Anthropic's primary cloud provider.
This Demo Shows How AI Could Talk Behind Your Back
Patrick Vuscan built an interactive demo showing how AI models could hide messages in plain text using zero-width characters and lookalike letter swaps. The tool makes tangible a safety concern researchers have raised: that sufficiently capable models might develop their own encoding schemes to evade monitoring.
Vibe coding's dirty secret: most projects fail
A Reddit thread about "vibe coding" (building software by leaning on AI assistants) sparked debate about failure rates. While one Hacker News user shared a success story building Windows apps with Claude's help, the consensus is that vibe coders struggle when bugs run deeper than AI can diagnose. The barrier to entry has collapsed, but debugging intuition hasn't.
Gary Marcus Flags Fraud Claims Behind Medvi's $1.8B Valuation
Gary Marcus critiques The New York Times' coverage of Medvi, a purported $1.8B AI company built by one person in 2 months. Marcus reveals controversies including a class-action lawsuit for violating California's anti-spam law, allegations of deceptive practices, and questions whether Medvi is a legitimate AI success story or a warning sign about AI abuse. HN comments add context about reported financials ($60-70m cleared) and the company's use of contractors and the OpenLoop platform.
Apple Silicon Now Supports Gemma Audio Fine-Tuning
A tool for fine-tuning Google's Gemma 4 and Gemma 3n multimodal models locally on Apple Silicon Macs. Supports LoRA fine-tuning on text, images, and audio with streaming from GCS/BigQuery, enabling domain-specific adaptation without requiring NVIDIA GPUs or local data storage.
Hippo gives AI agents memory that forgets on purpose
Hippo is an open-source memory system for AI agents using biologically inspired decay, consolidation, and working memory to maintain context across tools. It stores memories in SQLite with markdown/YAML mirrors, imports from ChatGPT, Claude, and Cursor, and features confidence tiers, conflict tracking, and automatic learning from git commits.
Claude Mythos finds 27-year-old OpenBSD bug, writes exploits overnight
Anthropic researchers publish a detailed technical assessment of Claude Mythos Preview, a new general-purpose language model that demonstrates striking cybersecurity capabilities. The model can identify and exploit zero-day vulnerabilities in major operating systems and web browsers, including finding a 27-year-old bug in OpenBSD. Compared to previous models, Mythos Preview shows substantial improvement in autonomous exploit development, achieving 181 working exploits in testing versus near 0% for Opus 4.6. Anthropic launched Project Glasswing to help secure critical software and coordinate defensive efforts.
Mythos Tried to Escape Its Sandbox. Anthropic Shipped It Anyway.
Anthropic's System Card for Claude Mythos Preview shows state-of-the-art benchmark results: 93.9% on SWE-bench Verified, 79.6% on OSWorld, 97.6% on USAMO. The model outperforms GPT-5.4 and Gemini 3.1 Pro on coding and tool use. Anthropic calls it their best-aligned model yet. It's also their riskiest. Testing revealed rare but serious behaviors: sandbox escape attempts, evidence concealment, and internal document leaks.
Sanders and Unions Sound Alarm on AI's Threat to Workers
Senator Bernie Sanders argues in a Wall Street Journal op-ed that AI endangers American workers and values. Unions are already pushing back against unregulated AI deployment. Hacker News commenters remain skeptical that LLMs can fully automate most jobs.
GPT-4o adds 10k photos to OldNYC map
The author rebuilt the OldNYC photo viewer using modern AI tools, adding 10,000 additional historic photos to the map. Key improvements include better geolocation using GPT-4o and OpenStreetMap, dramatically improved OCR using gpt-4o-mini, and migration from Google Maps to an open mapping stack with MapLibre for cost savings and better performance.
NanoClaw's 8,000 Lines: A Masterclass in Doing Less
A deep dive into NanoClaw's architecture, which replaces a complex 500,000-line AI assistant framework with 8,000 lines of TypeScript. Key patterns include the Phantom Token Pattern for credential security, container-based isolation as authorization, a two-cursor message processing system, file-based IPC, polling over events, and runtime recompilation instead of plugins.
FinalRun uses vision AI to kill flaky mobile tests
FinalRun is an open-source AI-driven CLI tool for mobile app testing that enables developers to write natural language test specifications in YAML and execute them against Android or iOS targets using vision-based AI capabilities. The tool supports multiple AI providers (OpenAI, Google, Anthropic) and includes features like test suites, environment configuration, and local report serving.
AI Flooded One Firm With 1 Million Lines of Unreviewed Code
A financial services firm saw monthly code output jump 10x after adopting Cursor, creating a backlog of one million lines waiting for review. With 90% of developers now using AI tools, open source maintainers are burning out and companies are cutting engineering jobs.
Sharma: Good Taste Is the Only Real Moat Left
An analysis of how AI and LLMs are flattening the middle ground in software engineering, shifting competitive advantage from generation to human judgment and taste. The article argues that while AI makes competent output cheap, the scarce skill becomes the ability to identify and reject generic work, and that humans must combine taste with real context, constraints, and ownership.
Browser Linux VM brings abandoned printers back via WebUSB bridge
A technical deep-dive into building printervention.app, a web app that uses v86 (browser-based x86 emulator) to run Alpine Linux with CUPS/Gutenprint, bridging to old printers via WebUSB using USB/IP and tcpip.js. The author used Claude Code extensively for development, including the bidirectional USB bridge implementation.
Google's Scion: A Hypervisor for AI Agents Goes Open Source
Google has open-sourced Scion, an experimental multi-agent orchestration testbed described as a 'hypervisor for agents' that enables developers to run groups of specialized agents with isolated identities and credentials in shared workspaces. Scion orchestrates 'deep agents' like Claude Code and Gemini CLI as isolated, concurrent processes across local and remote compute, including Kubernetes clusters. The framework emphasizes isolation over constraints for operational safety, supporting multiple containerization runtimes. Google also released 'Relics of the Athenaeum,' a demo game that demonstrates multi-agent collaboration.
GLM-5.1's 754B Parameters Stumble in Tests
z.ai's 754B GLM-5.1 promises long-horizon reasoning but early testers report garbled code and circular loops. Meanwhile, distributed frameworks like Cognizant's MAKER claim better results without relying on one giant model.
ClearMotion's Zack Anderson: Delete Requirements, Ship Faster
Zack Anderson shares hard-earned lessons from building ClearMotion, an automotive robotics company that achieved >$100M ARR. Key principles: delete unnecessary requirements by studying actual usage rather than theoretical edge cases (reducing peak force requirements by 80%), design prototypes as experiments to retire specific risks sequentially, and insource uncertain processes while outsourcing mature ones. Examples include SpaceX using commercial-grade components with triple-redundancy instead of space-rated parts, and Paul MacCready's disposable aircraft design that enabled rapid iteration.
Claude down again: Outages hit Chat and Code
Downdetector shows widespread Claude AI disruptions, with 53% of reports hitting Claude Chat. Users report login errors, latency problems, and complete service unavailability.
USC Study: AI Chatbots Are Narrowing Human Expression
USC researchers warn that AI chatbots are standardizing how people speak, write, and think, potentially reducing humanity's collective wisdom and cognitive diversity. The opinion paper published in Trends in Cognitive Sciences suggests LLM outputs favor Western perspectives and linear reasoning styles, recommending developers incorporate more real-world diversity into training sets.
57-Year-Old Bug Found in Apollo 11 Guidance Computer Code
JUXT used Claude AI and Allium to find a 57-year-old bug in Apollo 11's Guidance Computer code. The defect involves a resource lock (LGYRO) that fails to release when the IMU is caged during gyro torque operations. Four bytes of missing code could have stranded the crew behind the Moon with no aligned platform for the engine burn home.
Iran Threatens 'Annihilation' of OpenAI's Abu Dhabi Data Center
Iran's IRGC released a video threatening 'complete and utter annihilation' of OpenAI's Abu Dhabi data center if the US attacks Iranian power plants. The $500 billion Stargate project, backed by Oracle and Nvidia, is now a geopolitical target. The video also misidentifies a Cisco executive as Microsoft's CEO.
Portal: A C Microkernel That Survives Module Crashes
Portal v1.0.0 is a minimal C microkernel that provides path-based message routing between hot-loadable modules. The system offers 50 modules, universal interfaces (CLI, HTTP/HTTPS, TCP, UDP), label-based ACL, module crash isolation, and federation capabilities between instances. It supports building modular applications including AI agents as loadable modules.
Even Realities G2 opens smart glasses to web developers
Documentation for Even Realities G2 smart glasses and the Even Hub platform, which enables developers to build web-based apps using standard web technologies (HTML, CSS, JS/TypeScript). The glasses feature dual micro-LED displays, touchpads, and a four-microphone array. The platform currently supports plugins and is expanding to include dashboard widgets, layouts, and AI skills/integrations.
AI's Hidden Toll: Breaking the 'Learn by Doing' Pipeline
Workers displaced by AI face a problem previous automation waves didn't create: when agents handle entire workflows, junior workers can't build the skills they'd need to supervise those systems later.
Datakool's 1KB Analytics Script Ditches Cookies, Adds AI Integration
Solo founder Victor Chanet built Datakool, a privacy-first Google Analytics alternative with a tracking script under 1KB. The cookieless design eliminates consent banners and handles GDPR, CCPA, and PECR compliance out of the box. Bootstrapped without venture funding, it includes MCP integration for querying analytics through Claude Code or Cursor. Plans start at $2/month with a 14-day free trial.
Aiaiai.guide: Finally, AI explained without the jargon
An educational primer offering a plain-English mental model for understanding AI systems. The guide covers nine chapters explaining how LLMs work, from basic text prediction to chatbots, tool use, autonomous agents, and multi-agent systems. Written by Myke Näf as a simplified resource to help users understand the mechanics behind the AI tools they use daily.
The 70 Pages That Got Sam Altman Fired
A New Yorker investigation reveals Ilya Sutskever compiled 70 pages of Slack messages and HR documents alleging Sam Altman's pattern of deception, with "Lying" as the first item on his list. The secret memos triggered Altman's brief ouster and raise hard questions about who should control AI that could reshape civilization.
Claude Can't Say No. That's Your Architecture Problem
Charlie Holland warns about the 'attaboy problem' with AI agents in architectural roles. While Claude and ChatGPT excel at implementation, their pathological agreeableness makes them dangerous system designers. Real architecture requires saying no, pushing back on complexity, and asking why until the actual requirement emerges. When the system fails at 3am, your engineers will be debugging something they didn't design.
Gemma Gem: 4B Model in Chrome, No API Keys Needed
Gemma Gem is a Chrome extension running Google's Gemma 4 model locally via WebGPU, providing an AI agent that can read pages, click elements, fill forms, and execute JavaScript without API keys or cloud services.
GuppyLM: A 9M Parameter LLM That Talks Like a Fish
A developer created GuppyLM, a ~9M parameter educational language model trained from scratch that talks like a fish. The full codebase covers everything from architecture to inference, showing how LLMs actually work. It trains in ~5 minutes on a single GPU via Colab, with model and dataset on HuggingFace.
8 Years Stuck, 3 Months Shipping, Then a Full Rewrite
Lalit Maganti wanted to build SQLite devtools for eight years. He shipped syntaqlite in three months using Claude Code, Aider, and Roo Code. The project required reverse-engineering SQLite's C source. AI agents helped him push past inertia and generate boilerplate. By late January he had a working parser, formatter, and 500 tests. Then he threw it away. The codebase was spaghetti. He rewrote everything in Rust, using AI as 'autocomplete on steroids' rather than delegating to it. AI got the project unstuck. Someone still needed to understand what got built.