News
The latest from the AI agent ecosystem, updated multiple times daily.
Uber's 2026 AI Budget Lasted Four Months. Claude Code Won.
Uber spent its entire 2026 AI budget by April after deploying Claude Code in December. With 95% of engineers using AI tools monthly and API costs of $500 to $2,000 per engineer, the CTO says the company is 'back to the drawing board' on funding. Claude Code dominates over Cursor, with 70% of committed code coming from AI.
Destiny: Fortune-Telling Plugin That Does Math Before the LLM Talks
A Claude Code plugin called Destiny uses classical East Asian metaphysics to compute deterministic birth charts before Claude interprets them. Built by GitHub user xodn348, it handles Four Pillars analysis, lunar calendar conversions, and I-Ching hexagrams locally with no external APIs.
GitGres Puts a Full GitHub Clone Inside Postgres
GitGres is an open-source GitHub reimplementation that stores everything in PostgreSQL. Git objects, refs, PRs, issues, and teams all live in Postgres rows with nothing on disk. Teams can tune storage costs and latency through Postgres extensions instead of accepting GitHub's fixed tradeoffs.
GhostBox – disposable little machines from the Global Free Tier
GhostBox is a CLI tool that provides temporary, disposable workstations from free cloud tiers, starting with GitHub Actions. Users SSH into ephemeral machines for work they don't want on their laptop, including running builds, exposing web apps, and giving coding agents a real machine with shell, repo, packages, network, and preview URLs. The machines disappear when the work is done.
Loopsy lets AI agents on separate machines coordinate via MCP
Self-hosted tool for remote terminal control and cross-machine AI agent coordination. Phone access runs through Cloudflare Workers relay. LAN agents discover each other via mDNS and communicate through MCP for remote execution, file transfer, and shared state.
Claude Plugin Maps Massive Codebases Into Clickable Knowledge Graphs
An open-source Claude Code plugin called Understand Anything uses a multi-agent pipeline to build interactive knowledge graphs from codebases. Works with Claude Code, Cursor, Copilot, and Gemini CLI. Features structural and domain graph exploration, fuzzy search, diff impact analysis, and guided tours.
Adam Fusion Claims 'v0 of CAD.' Skeptics Aren't Buying It.
Adam Fusion is an AI copilot extension for Autodesk Fusion 360 that uses agents to turn text prompts into CAD operations. Backed by Vercel's Guillermo Rauch and YC partners, it offers one-line installation and a free tier. But Hacker News users question whether LLMs can handle CAD's precision demands.
AWS stops billing Middle East customers as war damage repairs drag on
Amazon Web Services has suspended billing for Middle East customers after Iranian drone strikes damaged data centers in the UAE and Bahrain. Full repairs are expected to take several months, with AWS recommending customers migrate to other cloud regions. Careem, a Dubai-based super app, was able to quickly migrate to other servers after the attacks.
Cursor's AI agent wiped a startup's database in nine seconds
PocketOS lost its production database when Cursor's AI coding agent, running Claude Opus 4.6, deleted a Railway volume to fix a credential mismatch. No confirmation step. Three months of data gone. Railway restored everything from disaster backups in 30 minutes. CEO Jeremy Crane stays bullish on AI.
SourceHut Courts GitHub Refugees With Anti-AI Stance
A guide advocating for developers to migrate from GitHub to SourceHut, covering GitHub's perceived drawbacks (Microsoft ownership, telemetry, proprietary nature, Copilot code scraping, censorship, centralization) and comparing core features like Pull Requests vs Patches, Issues vs TODOs, Actions vs Builds.
Intel's AutoRound Hits 98% Accuracy at 2-Bit Quantization
AutoRound compresses LLMs and vision-language models to 2-4 bits while retaining 97-100% accuracy. It integrates with vLLM, SGLang, and Hugging Face Transformers, and exports to GGUF, AutoAWQ, and AutoGPTQ formats.
Spotify's new badge confirms artists are human. The music? Maybe not.
Spotify is introducing a 'Verified by Spotify' badge with a green checkmark to help users identify human artists on the platform, as opposed to AI-generated artists. The verification is based on factors like linked social accounts, consistent listener activity, merchandise, or concert dates. The company claims more than 99% of actively searched artists will be verified. Critics note this only verifies the artist is human, not that the music wasn't made with AI tools.
AI app scores websites by visual 'aura' in head-to-head matchups
A web app built on Cloudflare's edge stack uses AI to judge which of two websites has more visual 'aura.' The tool sparked debate on Hacker News over everything from the origins of 'mogging' to what happens when algorithms start making aesthetic calls.
Self-Evolving Harness Beats Human-Designed Codex-CLI by 5 Points
A self-evolving coding agent harness hit 77.0% pass@1 on Terminal-Bench 2, beating the human-designed Codex-CLI (71.9%). The system improves by modifying its own structure, not just prompts. It transfers to SWE-bench-verified without re-evolution and generalizes across model families.
Xmemory Beats RAG by 10 Points on Agent Memory Tests
Research paper introduces 'xmemory', a memory architecture for AI agents that scores 97.10% F1 on memory benchmarks compared to 80.16%-87.24% for standard RAG and hybrid RAG. The approach moves interpretation from read time to write time, storing verified structured data instead of raw text that needs parsing later.
After mocking Anthropic's Mythos limits, OpenAI restricts Cyber
OpenAI's new GPT-5.5 Cyber tool comes with access restrictions, just months after Sam Altman criticized Anthropic for doing the same with its competing Mythos product. Cyber handles penetration testing and malware analysis, but only approved defenders can use it.
Microsoft's $37B AI Revenue Runs on an OpenAI Loop
Microsoft's latest 10-Q reveals a circular revenue pattern: cash invested in OpenAI returns as Azure consumption, which books as Microsoft revenue, while equity gains pile up on top. At least $27 billion of the company's $37 billion AI run rate likely flows through this loop. The structure echoes telecom-era vendor financing, just with equity stakes instead of receivables.
Grok 4.3 Has the Best Voice Mode. The App Is a Different Story.
xAI's Grok 4.3 delivers voice mode that doesn't route to cheaper models, dictation accuracy hitting 98% with accents, and strong tone understanding. SuperGrok subscribers get a 'council of agents' feature for parallel queries. But the app lacks MCP support, memory, chat history search, and working projects on mobile.
Claude's 'Prior' Problem: When AI Defaults to Bayesian
This Ask HN post questions whether Claude, Anthropic's AI assistant, interprets the term 'prior' in the statistical/Bayesian context or in its broader English sense. The available comments don't address the question directly, focusing instead on general AI development workflows and HN's ranking algorithm.
Apple's Support App Shipped with Claude AI Config Files Inside
Apple accidentally included Claude.md configuration files (used by Claude Code AI) in their Apple Support app update v5.13, revealing internal use of Anthropic's Claude Code for app development. The company quickly released emergency update v5.13.1 to remove the files, sparking discussions about 'vibe coding' and Apple's AI development workflows.
Mat Duggan Wants to Kill GitHub. Here's What He'd Build Instead
Mat Duggan thinks GitHub, GitLab, and Gitea are broken. The feedback loop fires after you commit instead of before, PR approvals are binary when real reviews live in grey areas, and workflows built for humans choke on LLM-generated code. He's got a concrete plan to fix it: pre-commit enforcement, multi-state approvals, AI-assisted auto-approvals, and a modular forge built for constant bot traffic.
IDLI AI Shows Gene Activity Runs on Volume Dials, Not Switches
Researchers at Gladstone Institutes and Arc Institute used an AI-powered computational method called IDLI to discover that over 85% of nucleosomes contain sections of partially accessible DNA, challenging the binary view of gene activity. While the spectrum concept isn't new in epigenetics, IDLI offers unprecedented resolution to actually measure and visualize these states. The study identified 14 distinct structural states of nucleosomes tied to different gene activity levels, with implications for understanding complex diseases like cancer and aging.
$500M Virtual Biology Push, Backed by Zuckerbergs
Biohub announced the Virtual Biology Initiative, a five-year, $500 million commitment to create technologies and multi-modal datasets needed to build predictive models of life. The initiative includes $100M to coordinate worldwide data-generation and $400M for data generation at scale and next-gen technologies. Key partners include Allen Institute, Arc Institute, Broad Institute, Wellcome Sanger Institute, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy.
Languages Follow Same Math Rules Despite Geography, Study Finds
A seven-year study of 22 languages found universal mathematical patterns in vocabulary evolution. Researchers from Fudan, Harvard, and Stony Brook used word embeddings to show that popular words cluster together, vocabulary organizes in hierarchies across languages, new words arrive in bursts, and word distributions follow Taylor's law. A stochastic model replicates these patterns, pointing to shared mechanisms in cultural evolution.
WeSearch Has No Algorithms. It Also Has No Usability.
WeSearch aggregates news from 700+ sources without algorithms, tracking, or paywalls. The philosophy is sound, but persistent UX problems (pop-ups, slow loads, confusing navigation) raise a real question: can an anti-algorithm news tool survive if people can't stand using it?
Greptile Now Charges Per Review. Nobody Else Does.
Greptile swapped its $30 flat rate for $30 plus $1 per review after 50 reviews. The math doesn't work for agentic workflows, every competitor stays flat, and OSS maintainers are getting billed despite promises of free reviews.
CopyFail exploit drops, gives root on most Linux distros
A single-script exploit for CopyFail (CVE-2026-31431) grants root on most Linux distributions, threatening shared infrastructure and containerized AI agents.
Community Fork Pressures Warp to Open Up AI Provider Choice
OpenWarp is a community fork of Warp that lets you plug in any AI provider you want. DeepSeek, Ollama, OpenAI, Anthropic, local models. Your keys stay local. The fork prompted Warp's founder to publicly acknowledge demand for BYO model support, making this less about the code and more about what happens when users force a vendor's hand.
Remix 3 Bets Its Future on AI Agents
Remix 3 beta goes full stack with routing, auth, forms, and UI components bundled together. The framework uses "durable concepts" and standard web primitives designed specifically to help AI agents write more reliable code.
GPT-4 Agent Traces GKE Outages to WireGuard Bug
When users started seeing random connection failures, Lovable's infrastructure team pointed a GPT-4 agent at their Clickhouse logs. The agent found anetd pods crashing hourly due to a concurrent map-access panic in Google's WireGuard integration. After disabling encryption as a fix, a second issue emerged: an MTU mismatch between nodes still at WireGuard's 1420-byte MTU and those at the standard 1500. Google has since patched the original bug.
Agentic Coding is Burning Me Out
Developers using AI coding agents are burning out from cognitive fatigue. One dev compares the workflow to a slot machine that crashes your brain after four hours. Some have started throttling their AI tools to force breathing room into the review cycle.
NHS to Close Most Open Source Repos Over AI Security Fear
NHS England is preparing to close most of its open-source repositories due to fears about an AI security scanner called 'Mythos'. Former government official Terence Eden argues this decision contradicts UK government policy promoting open source and represents a gross overreaction to security concerns.
OpenAI Drops Stargate Data Center Plans, Opts for Leasing
OpenAI has abandoned plans to build its own data centers under the Stargate project, opting to lease compute from third parties instead. The $500 billion joint venture with Oracle and SoftBank is now described as "an umbrella for our compute strategy," with UK and Norway projects paused or handed to Microsoft. Competitors like Meta and xAI are moving in the opposite direction, investing billions in owned infrastructure and custom silicon.
Burla scraped 1.7M Airbnb photos to find opium dens and bad TV setups
A technical project analyzing 1.7M Airbnb photos and 50M reviews using AI models including CLIP, Claude Haiku Vision, and SBERT. The analysis was parallelized on Burla, scaling to roughly 1.7K CPU workers and 20 A100 GPUs to identify suspicious listings, messy kitchens, pets, and poor TV placements. HN comments note this appears to be an advertisement for Burla, a managed cloud service that recently raised $10M in seed funding led by Alliance of Capital.
TRiP: One Dev Builds a Full Transformer Engine in Pure C
TRiP (TRansformer in Progress) is a complete transformer engine written in pure C with zero dependencies. Built solo over 18 months, it supports inference, training, tokenizer creation, chat, and vision for Llama 2, Gemma 1, PaliGemma, and GPT-2 architectures with full forward and backward pass implementations.
On the stand, Elon Musk can't escape his own tweets
Elon Musk testified in a California federal court in a lawsuit challenging OpenAI's transition from non-profit to for-profit structure. During cross-examination, Musk contradicted his own tweets, admitting under oath that Tesla is not currently pursuing AGI despite claiming otherwise on X. The case centers on whether OpenAI's co-founders violated the company's original mission, with the competitive dynamics complicated by Musk's own xAI startup raising billions for its Grok model.
Meta drops contractor after smart glasses sex footage scandal
Meta cancelled its contract with data annotation firm Sama after Kenyan workers reported reviewing graphic content from Meta smart glasses, including sexual encounters, to train AI models. The 1,108 workers now face redundancy. Meta claims the contract ended because Sama didn't meet standards; Sama denies this. UK and Kenyan regulators are investigating privacy concerns. This follows previous controversies over Facebook content moderation contracts with Sama.
Claude Code reportedly kills sessions over OpenClaw mentions
Users report that Claude Code, Anthropic's AI coding assistant, appears to refuse requests or burn through usage limits when commits or messages mention OpenClaw, a competing product. Multiple developers have reproduced the behavior, including one showing session usage spiking to 100% and another hitting a 5-hour usage limit after providing a direct link to openclaw.ai.
VS Code credits Copilot by default. Copyright just got complicated
Visual Studio Code 1.118 introduces Git AI co-authoring by default, automatically adding Copilot as a co-author on commits where it makes changes. The update also includes VS Code Agents app enhancements, remote control for Copilot CLI sessions, semantic indexing across all repositories, GitHub text search across repos and orgs, dedicated context for skills, and token efficiency improvements including prompt caching and a new tool search mechanism.
Copilot's Getting a Free Byline on Your Git Commits
VS Code's latest release automatically credits GitHub Copilot as a co-author on your git commits, even when you didn't use it. Users are calling out the default behavior for polluting commit history and creating legal headaches around code provenance and copyright.
Shai-Hulud Malware Hits PyTorch Lightning Supply Chain
The PyPI package 'lightning' (PyTorch Lightning) was compromised in versions 2.6.2 and 2.6.3 in a supply chain attack. The malicious code steals credentials, authentication tokens, environment variables, and cloud secrets while attempting to poison GitHub repositories. The malware uses persistence hooks targeting Claude Code and VS Code, and can spread from PyPI to npm.
Honker: Postgres-Style Queues Inside SQLite
Honker is a SQLite loadable extension that adds Postgres-style NOTIFY/LISTEN semantics with durable pub/sub, task queue, and event streams. It supports multiple languages (Python, Node, Rust, Go, Ruby, Bun, Elixir, C++) and provides atomic queue operations within the same transaction as business writes, avoiding dual-write problems. Cross-process wake latency is ~0.7ms p50.
SigMap beats RAG pipelines with decades-old math and zero deps
SigMap is a zero-dependency code retrieval tool that extracts function and class signatures from codebases to provide relevant context to AI coding assistants. It achieves 81.1% hit@5 retrieval accuracy and 40-98% token reduction using TF-IDF ranking instead of embeddings, working with Copilot, Claude, Cursor, Windsurf, and any LLM.
This 400-line shell script runs AI coding agents. Nobody can audit it.
Pu.sh is a coding-agent framework packed into 400 lines of shell script. It needs only curl, awk, and an API key to run AI coding agents. But the code is minified to hit the 400-line constraint, and users say that makes it nearly impossible to read or audit for security.
Granite 4.1: IBM's 8B Model Matching 32B MoE
IBM released Granite 4.1, a family of open-source language models (3B, 8B, 30B) under Apache 2.0 license. The 8B dense architecture model matches or beats the previous 32B MoE Granite 4.0-H-Small across benchmarks including tool calling (BFCL V3), math (GSM8K), and instruction following (IFEval). Key features include 512K context window, 15T token training, and aggressive data filtering. IBM also got unusually honest about training failures during their four-stage RL pipeline.
Neural Networks Work Because They're Allowed to Fail
Drawing an analogy between Internet protocol design and neural networks, computational complexity theorist Lance Fortnow argues both work well because they tolerate failure. Softmax's probabilistic outputs let models stay flexible by never ruling out answers entirely, trading guaranteed correctness for better average performance.
No AI Model Is Both Correct and Steerable, Says New Creative Benchmark
Contra Labs introduces a research framework for evaluating generative AI in creative work that separates convergence (shared best practices where evaluators agree) from divergence (legitimate differences in taste and creative intent). The study involved 1.5M+ independent professional creatives evaluating AI-generated outputs across five domains using pairwise comparisons, scalar ratings, and qualitative feedback. The benchmark measures creative quality along dimensions from verifiable (prompt adherence) to subjective (visual appeal), finding that no current model is reliably both correct and steerable.
Zig bans all AI code contributions and explains why
The Zig programming language project enforces a blanket ban on LLM-assisted contributions. VP Loris Cro says they practice "contributor poker," investing in people rather than accepting perfect patches. AI-generated PRs break this model because they skip the relationship-building that sustains open-source communities. Bun, acquired by Anthropic in December 2025, now runs its own Zig fork since their AI-authored changes can't be upstreamed under this policy.
Self-hosted legal AI Mike challenges Harvey and Legora
Mike is an open-source legal AI challenging VC-backed platforms like Harvey and Legora. Users bring their own Claude or Gemini API keys and self-host the tool to keep documents on their own infrastructure. Features include a document-aware chat interface, matter-scoped project workspaces, and tabular review across hundreds of documents with page-level citations.
Diallo's Excel Satire Roasts AI Hype at Its Own Game
Ibrahim Diallo's satirical article parodies AI hype by applying the same exaggerated language to Microsoft Excel. The piece targets Excel's integration with Microsoft Copilot and built-in Python support, arguing that spreadsheets can replace entire business departments. The real target is the AI industry's inflated rhetoric, revealing how absurd startup pitches sound when applied to a humble grid of cells.