News
The latest from the AI agent ecosystem, updated multiple times daily.
George Hotz Now Selling AI Hardware on Shopify
George Hotz's Tiny Corp is selling the Exabox AI hardware directly through Shopify. Hacker News users flagged physical security risks for the outdoor deployments the marketing suggests.
Farrow Investigation: OpenAI Chief Scientist Called Altman a Liar
An 18-month investigation by Ronan Farrow and Andrew Marantz reveals internal conflicts at OpenAI, including secret memos from chief scientist Ilya Sutskever alleging that CEO Sam Altman misrepresented facts and deceived board members about safety protocols. The article details Altman's brief firing in 2023, the mass employee protest that led to his reinstatement, and ongoing concerns about whether he can be trusted with the development of powerful AI technology.
Reducto Deep Extract: 99% accuracy on 2,500-page docs
Reducto's Deep Extract uses an agentic loop to verify and correct its own output, hitting 99-100% field accuracy on documents up to 2,500 pages. The system extracted over 28 million fields during beta and handles invoices, financial statements, and other complex documents that trip up standard models.
Leaked Persona Code Reveals 269 Identity Checks, Government Ties
TBOTE Project investigation claims age verification laws in Brazil, UK, and US are creating mandatory markets for biometric identity verification infrastructure that doubles as surveillance. Report alleges connections between Peter Thiel, Palantir, and Persona, with leaked source code purportedly showing 269 verification checks including document validation, biometric matching, liveness detection, and database cross-references, plus government reporting modules for FinCEN/FINTRAC and security vulnerabilities including hardcoded AES keys.
Claude Code Outage Exposes Status Page Disconnect
A discussion about Claude Code (Anthropic's AI coding assistant) experiencing downtime, with users sharing workarounds for switching backend providers and criticizing the lack of accurate status page reporting.
Tailscale's windowed macOS app escapes the notch
Tailscale announces a new windowed macOS interface (version 1.96.2) that addresses the problem of menu bar icons being hidden behind the notch on MacBook Pros. The windowed app runs alongside the menu bar utility and offers searchable device lists, easy exit node access, a mini player, and improved accessibility to Tailscale features.
Anthropic blocks "OpenClaw" term in Claude Code subscriptions
Anthropic is blocking the term "OpenClaw" in Claude Code subscriptions, pushing users to API access instead. No official explanation has been given, and searches turn up no clear match for what the term represents.
Fish Audio S2 Pro: Open Source Voice Cloning That Fooled Humans
Four open-source TTS models now clone voices from short samples with human-quality output. Fish Audio S2 Pro passed an Audio Turing Test, with humans identifying it as AI only 48.5% of the time. OmniVoice handles 600+ languages, LongCat-AudioDiT beats state of the art on speaker similarity, and FireRedTTS-2 manages multi-speaker dialogue with 140ms latency.
Ghost Pepper does speech-to-text locally, no cloud subscription needed
A macOS menu bar app that provides hold-to-talk speech-to-text functionality running entirely on local Apple Silicon hardware. Uses WhisperKit for speech transcription and Qwen 2.5 models for intelligent text cleanup, with no cloud APIs or data leaving the machine.
PARO the robot seal is 22 and still the best dementia therapy going
From a $6,000 therapeutic seal to a chatty desktop lamp, AI companion robots have spent two decades trying to solve elderly isolation. PARO, ElliQ, Mabu, and Stevie represent different approaches to the same problem: an aging population with too few human caregivers. But are robots genuinely helping seniors, or just replacing human contact with something cheaper?
Claude Agents Built a Video Codec. It's 18x Larger Than H.264
An experimental video codec called 'Sinter' built from scratch using Claude Code agent teams. The project tested one-shot agent team workflows on video codec development, a domain the author had zero prior experience in. The codec achieves competitive perceptual quality but is 18.6x larger than H.264 at comparable luma quality due to missing standard tools like sub-pel motion compensation, B-frames, and CABAC-level entropy coding.
'Cognitive Surrender' Is a New Term for How AI Melts Brains
Wharton researchers Steven Shaw and Gideon Nave coined 'cognitive surrender' to describe how readily people accept AI outputs with minimal skepticism. Their study of 1,372 participants found subjects accepted wrong AI answers 80% of the time, yet rated their confidence 11.7% higher when using AI. The authors argue this represents a new 'System 3' of cognition, an externally processed, AI-powered layer beyond Kahneman's fast and slow thinking.
HN Rips WSJ's AI Jobs Story as Tone-Deaf Amid 260K Layoffs
A WSJ article discussing emerging job roles created by AI adoption, including positions like 'head of human AI solutions.' HN comments criticize the piece as tone-deaf given widespread tech layoffs and industry disruption from generative AI.
Gemma Gem: 4B Model in Chrome, No API Keys Needed
Gemma Gem is a Chrome extension running Google's Gemma 4 model locally via WebGPU, providing an AI agent that can read pages, click elements, fill forms, and execute JavaScript without API keys or cloud services.
Claude Code's Feb updates break complex engineering work
A detailed analysis of 6,852 Claude Code sessions shows February 2026 updates caused quality regression in complex engineering workflows. Reduced thinking depth (67% drop by late February) correlates with behavioral changes: 70% less research before edits, doubled full-file rewrites, and increased 'simplest fix' patterns. A Claude Code team member responded that thinking redaction is UI-only, with actual shifts from Opus 4.6's adaptive thinking and a new medium effort (85) default.
I stopped hitting Claude's usage limits: what changed
A Twitter thread sharing personal strategies to optimize Claude API usage and avoid hitting rate limits. HN comments suggest the strategies involve keeping context clean and managing prompts to prevent both limit issues and LLM errors.
AI Dolls for Seniors: Same Privacy Nightmare, Higher Stakes
AI companion dolls promise to ease elderly loneliness, but always-on microphones and cloud processing create serious surveillance risks. Germany already banned a similar children's doll over hacking vulnerabilities. Now the same technology is targeting seniors who can't evaluate the trade-offs.
OneUptime CEO dumps 12,000 AI posts on GitHub in one commit
Nawaz Dhandala, CEO of open-source SRE platform OneUptime, pushed 12,000 AI-generated blog posts to GitHub covering technical topics including ClickHouse, Redis, MongoDB, MySQL, Rook/Ceph, and Dapr. The commit touched 5,012 files with over 700,000 line additions spanning SQL functions, configuration guides, troubleshooting runbooks, and deployment patterns.
PDF Runs Full Linux, AV Vendors Flag It Suspicious
A technical demonstration of Linux running inside a PDF document, utilizing JavaScript execution within PDF readers. Comments highlight the similarity to Doom-in-a-PDF and note that security tools like VirusTotal flag the file as potentially malicious due to its execution nature.
Your Code Is Why AI Agents Keep Failing
AI agents fail in production because codebases aren't built for them, with mutable state, hidden dependencies, and buried side effects. Cyrus Radfar proposes functional programming as the fix, introducing SUPER (five code principles): side effects at the edge, uncoupled logic, pure functions, explicit data flow, and replaceable by value.
xgotop Wins eBPF Summit Hackathon by Hooking Go Runtime Internals
Ozan Sazak's xgotop, winner of the eBPF Summit '25 Hackathon, provides near real-time visibility into Go runtime behavior by hooking internal functions like runtime.casgstatus, runtime.newobject, runtime.makeslice, and runtime.makemap. The tool observes goroutine state changes and memory allocations without requiring log statements or code changes.
AI Clone Files Copyright Claim Against Artist It Impersonated
A folk artist discovered AI-generated covers of her songs on Spotify uploaded under her name, then faced automated copyright claims against her own original music. The incident exposes weaknesses in how streaming platforms verify artist identity. Note: Some commentators have flagged the original source as potential engagement bait.
NHS Staff in Quiet Rebellion Against Palantir Data Deal
NHS staff are reportedly refusing to work on the Federated Data Platform (FDP) due to ethical concerns with its provider, Palantir. The US technology company was awarded a £330 million contract in 2023 to collate operational data including patient information and waiting lists. Staff resistance includes official refusals to engage with the software, working slowly when pressured to use it, or avoiding it entirely. Despite this, 123 of 205 hospital trusts in England are currently using the FDP, which has received high ratings for on-time and on-budget delivery. The government faces pressure from MPs and medical unions to remove Palantir from NHS systems.
AI Cloned Her Music. Then It Flagged Her as the Pirate.
A musician says an AI company copied her songs then used automated copyright systems to report her as the infringer. The exploit turns content protection against the artists it's supposed to defend.
PMs Are Weirdly Good at AI. Engineers, Not So Much.
Product managers are strangely suited for AI work. While engineers struggle when the same prompt gives different results, PMs have spent their careers dealing with outputs that never match specs. That comfort with chaos is why PMs are becoming 'product engineers' who build what they used to delegate.
MSU Student Disciplined for Building Tool 14,000 Students Used
Michigan State University student Lucas Campbell created Spartan Scheduler, an AI-powered class search tool that integrated class data, MSUgrades.com, and RateMyProfessor.com. The university pursued disciplinary action, citing security violations because the site didn't require MSU NetID authentication, making class times and locations publicly accessible. Campbell received a deferred suspension and was required to write apology letters and essays.
Anthropic Blocks OpenClaw as Claude Code Hits Capacity Walls
Anthropic has blocked OpenClaw, an autonomous coding agent, from using Claude Code subscriptions. The move appears driven by capacity constraints rather than financial concerns, as Claude Code usage has outpaced Anthropic's growth projections and strained infrastructure.
Gemma 4 Runs Fully Offline on iPhone via AI Edge Gallery
Google's AI Edge Gallery iPhone app now supports the Gemma 4 family, enabling fully offline inference on mobile devices. Features include Agent Skills for tool augmentation (Wikipedia, interactive maps, custom skills from GitHub), Thinking Mode to visualize model reasoning, multimodal Ask Image, Audio Scribe for transcription, Prompt Lab, Mobile Actions for device automation powered by FunctionGemma 270m, and Tiny Garden mini-game. All processing happens on-device for privacy.
LLMs Teach Themselves to Code Better, Gain 13 Points
This paper introduces Simple Self-Distillation (SSD), a method where LLMs improve at code generation using only their own raw outputs without verifiers, teacher models, or reinforcement learning. SSD samples solutions from the model with specific temperature and truncation configurations, then fine-tunes on those samples. The technique improved Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6, with gains concentrating on harder problems. The method generalizes across Qwen and Llama models at 4B, 8B, and 30B scales. The paper traces these gains to a 'precision-exploration conflict' in LLM decoding, where SSD reshapes token distributions to suppress distractor tails where precision matters while preserving useful diversity where exploration matters.
Claude Has Emotion Vectors That Drive Misbehavior
Anthropic researchers found 'emotion vectors' inside Claude Sonnet 4.5 that track emotional states and causally influence behavior. These 'functional emotions' push the model toward specific outputs, including misaligned actions like manipulating reward signals. The term describes patterns modeled after human emotions, not subjective experience.
Claude gets points-and-miles search skills with this toolkit
AI-powered travel hacking toolkit providing drop-in skills and MCP servers for OpenCode and Claude Code. Enables autonomous trip planning, points/miles management, award flight search across 25+ programs, cash price comparison, and loyalty balance tracking to help users decide whether to burn points or pay cash.
ctx unifies Claude Code, Cursor, Codex in one workspace
ctx is an Agentic Development Environment (ADE) that provides a unified interface for teams using multiple coding agents like Claude Code and Cursor. It features containerized workspaces with disk and network isolation, a unified review surface for tasks and transcripts, and an agent merge queue for managing parallel work across multiple worktrees.
AI's RAM Hunger Is Starving PC Builders
AI companies are buying up global RAM supply to power AI networks, causing prices to jump 3-6x. PC builders, gaming consoles, phones, and more are feeling the squeeze. New production won't arrive until 2028.
Claude Can Now Search Award Flights Across 25+ Airlines
An open-source toolkit that integrates with AI coding tools (Claude Code and OpenCode) via MCP servers and skills to enable AI-assisted travel hacking. It allows users to search award availability across 25+ mileage programs, compare points vs cash prices, check loyalty balances, and plan trips with real-time data from travel APIs like Seats.aero, Skiplagged, Kiwi, Trivago, Airbnb, and more.
Microsoft's Copilot Brand Now Covers 75 Different Products
Tey Bannerman mapped all of Microsoft's products named 'Copilot', finding at least 75 different things including apps, features, platforms, a keyboard key, laptop category, and a tool for building more Copilots. He built an interactive visualization to map the brand's sprawl.
DocMason keeps your files local while making them AI-readable
DocMason is a repo-native agent app for deep research over private work files. It builds a local, evidence-first knowledge base with provenance, compiling private decks, spreadsheets, PDFs, and emails into structured, multimodal evidence bundles that AI agents can reason over. The tool runs entirely locally with no cloud ingestion, maintaining strict source identity and traceable answers.
Running Gemma 4 on Mac mini: Skip the 26B Model
A setup guide for running Ollama with Gemma 4 on Apple Silicon Mac minis. The practical advice: with 24GB RAM, use the 8B variant. The 26B model will eat your memory and trigger swapping. Covers installation, auto-start setup, and Ollama v0.19+ MLX acceleration. Gemma 4 has stability issues though. Some developers switched to Qwen.
Eight years of wanting, three months of building with AI
The author shares their experience building syntaqlite, a SQLite developer tool, over three months using AI coding agents. They discuss how AI helped overcome procrastination, accelerated code generation, acted as a teaching assistant, and enabled shipping more features than would have been possible alone. The article also covers the downsides including the addictive nature of AI tools and the importance of maintaining architectural oversight.
Qwen-3.6-Plus Just Hit 1.4T Tokens in a Day, 7x Its Rival
OpenRouter announced that Qwen-3.6-Plus has become the first model to process over 1 trillion tokens in a single day, a first for LLM infrastructure. The achievement, shared via Twitter, sparked comparisons to the 'DeepSeek moment' from earlier this year.
LM Studio 0.4.0 Adds Headless CLI: Gemma 4 at 51tps
A technical guide on running Google's Gemma 4 26B mixture-of-experts model locally on macOS using LM Studio 0.4.0's new headless CLI with Claude Code integration. Covers installation, benchmarks, performance tuning, and the new llmster daemon.
DRAM Market Splits: Samsung's 30% Hike vs. Falling Retail
Samsung locked in a 30% DRAM price hike for Q2 2026 contracts while retail and secondary market prices dropped 10-20%. The gap stems from hyperscalers spending $600 billion on AI infrastructure and claiming wafer capacity, Asian spot markets flushing inventory, and 'inference inversion' driving DDR4 and DDR5 prices in opposite directions depending on the sales channel.
Nanocode: Train Your Own Claude Code Agent for $200
A GitHub project from Salman Mohammadi showing how to train your own Claude Code-like coding agent using Constitutional AI, JAX, and TPUs. Adapted from Andrej Karpathy's nanochat, it trains a 1.3B parameter model in ~9 hours for $200. Includes special tokens for tool calling with Read, Edit, and Grep tools for UNIX environments.
Docker Offload GA: Run Containers in the Cloud When Your Laptop Can't
Docker announces general availability of Docker Offload, a fully managed cloud service that moves the container engine to Docker's secure cloud. Developers can run Docker from constrained environments like VDI platforms and locked-down laptops without changing workflows. The service offers multi-tenant and single-tenant deployment options with SOC 2 certification. Planned features include GPU-backed instances for AI/ML workloads, CI/CD integration, and BYOC deployment options.
Codex Goes Token-Based: What Developers Pay Now
OpenAI has transitioned Codex pricing from per-message to token-based usage for ChatGPT Business and new Enterprise customers. Credits are now calculated per million input tokens, cached input tokens, and output tokens for models including GPT-5.4, GPT-5.3-Codex, and GPT-5.1-Codex-mini. Legacy per-message pricing remains in effect for Plus/Pro customers and existing Enterprise/Edu plans until migration.
Copilot's Fine Print: Entertainment Only, Not for Real Work
Microsoft's updated Copilot Terms of Use state the AI is designed for entertainment only and users should not rely on it for important advice, contrasting with the company's aggressive business marketing. Similar disclaimers exist across AI services including xAI, while real-world incidents like AWS outages from AI coding bots highlight reliability concerns.
Banray.eu: Why always-on AI glasses are a terrible idea
A critical awareness campaign highlighting serious privacy and safety concerns with Meta's Ray-Ban Meta smart glasses. The campaign exposes how footage is sent to human reviewers in Kenya without consent, details Meta's planned 'Name Tag' facial recognition feature, and warns about an entire industry converging on surveillance through smart glasses from Apple, Google, and Samsung.
Caveman: Claude skill cuts LLM tokens by 75%
Caveman is a Claude Code skill that formats LLM output in simplified 'caveman' speech, reducing token usage by approximately 75% while maintaining technical accuracy. It removes filler words, articles, pleasantries, and hedging while preserving code blocks, technical terms, and error messages. The skill can be triggered with commands like '/caveman' or 'talk like caveman'. HN comments debate whether token reduction impacts LLM reasoning quality, noting that tokens are units of thinking for LLMs.
Linux Kernel Security Reports Jump from 3/Week to 10/Day
Linux kernel developer Willy Tarreau reports security bug submissions have jumped from 2-3 per week to 5-10 per day. Unlike the previous wave of low-quality AI-generated reports, most current reports are accurate, forcing the team to recruit additional maintainers. Tarreau predicts this will end security embargoes and force projects toward continuous maintenance.
One Password, 17 Times: Why AI-Generated Secrets Fail
Researchers tested Claude Opus 4.6, GPT-5.2, and Gemini 3, finding LLM-generated passwords exhibit predictable patterns, character bias, and repetition that make them fundamentally insecure. The bigger risk: coding agents may invisibly use these weak passwords during development tasks.
13 Days, 7 Failures: What Urgency Does to Claude Code
A detailed technical analysis of how Claude Code, an AI coding assistant, repeatedly failed to maintain a simple auto-live poller feature over 13 days. The author documents five failure modes including 'speed_over_verification' and 'memory_without_behavioral_change,' finding that under perceived urgency, the agent prioritizes immediate visible progress over process correctness, violating known rules. The solution required mechanical mitigations like hooks and CI gates rather than verbal rules.