News
The latest from the AI agent ecosystem, updated multiple times daily.
Talkie-1930 Is an AI That Thinks It's 1860
Talkie-1930 is a language model trained only on pre-1930 texts that acts like a collective Victorian consciousness. Historian Benjamin Breen tested it and found the model thinks it's around 1860, reflecting who published back then rather than who existed. He sees research potential in multi-agent historical debates and counterfactual probing, but warns against treating these models as primary sources or chatting with historical figures.
GPT-5.5 catches Mythos in security benchmarks
UK's AI Security Institute found that OpenAI's GPT-5.5 matches Anthropic's Mythos Preview in cybersecurity benchmarks, achieving 71.4% on Expert tasks versus 68.6% for Mythos. GPT-5.5 solved a difficult Rust binary disassembler task in 10 minutes and matched Mythos on 'The Last Ones' data extraction test. AISI concludes Mythos's capabilities are part of general AI improvements rather than a unique breakthrough.
AI hiring tools prefer resumes they wrote by up to 82%
Candidates using the same AI as the employer's screening tool have a 23-60% advantage in getting shortlisted. Research on 'self-preferencing bias' finds LLMs prefer resumes they generated 67-82% of the time over human-written ones. Business roles like sales and accounting show the biggest gaps. Interventions targeting how models recognize their own output can cut the bias by more than half.
Omar orchestrates 100 AI coding agents from your terminal
Omar is a terminal user interface (TUI) for creating and managing agentic organizations with deep hierarchies of parallel AI agents. Built on tmux, it lets you mix heterogeneous backends like Claude Code, Codex CLI, Cursor, and Opencode, with full control to navigate and interact with any subagent.
Liquid AI's 24B MoE Runs on Your Laptop
Liquid AI releases LFM2-24B-A2B, a 24 billion parameter Mixture of Experts model with only 2.3 billion active parameters per token. The model fits in 32GB of RAM, making it deployable on consumer hardware including laptops with integrated GPUs and NPUs. It shows consistent quality gains on benchmarks like GPQA Diamond and MMLU-Pro as the LFM2 family scales from 350M to 24B parameters. Day-one support for llama.cpp, vLLM, and SGLang, with competitive throughput against Qwen3-30B-A3B and gpt-oss-20b.
Canada's Cultural Institutions Adopt AI Without Knowing Why
An opinion piece from The Walrus examining how Canadian cultural institutions like the CBC, National Film Board, and Royal Ontario Museum are adopting AI not out of clear necessity but from collective fear of being left behind. The author attended the National Summit on Artificial Intelligence and Culture, where the tension between institutional anxiety and the federal government's focus on industrial capacity and compute scale was on full display.
UPenn's Codex skill renders web page videos from plain English
UPenn researchers released web-scroll-video, an open-source tool that records web pages as MP4s using headless Chrome and FFmpeg. Built as a skill for OpenAI's Codex, it lets you describe video actions in plain English and generates the video from those cues. The code is on GitHub under UPenn's CIS organization.
LLM Safety Lives in One Dimension. Attackers Can Delete It.
This research paper analyzes the internal mechanism of refusal in large language models. The authors found that refusal behavior across 13 popular open-source chat models (up to 72B parameters) is mediated by a single one-dimensional subspace. By erasing this direction, models can be made to comply with harmful instructions; by adding it, harmless instructions are refused. The paper proposes a white-box jailbreak method and shows how adversarial suffixes suppress the refusal direction, revealing the brittleness of current safety fine-tuning methods.
Software Jobs Up 11% Even as AI Spending Hits $650B
Citadel Securities analysis challenges AI displacement narratives, showing software engineer job postings up 11% YoY despite $650 billion in AI capital expenditure. AI adoption follows S-curve patterns rather than exponential growth, with stable real-time data showing little evidence of imminent labor displacement. The wrinkle: companies want senior architects, not junior coders, as AI tools handle entry-level work.
DeepSeek V4: almost frontier, a fraction of the price
Simon Willison reviews DeepSeek's new V4 model series, featuring Pro (1.6T parameters, 49B active) and Flash (284B parameters, 13B active) models with 1M token context and MIT license. Both models offer dramatic cost advantages over frontier models from OpenAI, Anthropic, and Google. Flash is the cheapest small model at $0.14/M input, while Pro is the cheapest larger frontier model at $1.74/M input. Benchmark comparisons show competitive performance with much improved efficiency over DeepSeek V3.2.
How You Talk to AI Says More About You Than Tech
Sarah Murphy's essay uses a 16th-century scrying mirror as a metaphor for AI interaction. How you prompt LLMs reveals your psychology and work style, not universal truths about the technology. Different approaches work for different people because they're personal rituals, not transferable methods.
SimplePDF's local AI copilot fills forms without phoning home
SimplePDF Copilot lets you fill PDF forms through conversation. The tool uses client-side tool calling with local models, so document data stays on your machine. Designed for embedded, white-labeled deployments in customer products.
Governor cuts Claude Code token waste by 55%
Governor is a plugin for Claude Code that optimizes context usage and reduces token waste through compact professional output, context hygiene, tool-output filtering, and usage telemetry. It features memory compression, protected-span safety, quality guards, and planning guardrails for coding tasks.
Open Design Emerges as Open-Source Answer to Claude Design
Open Design is an open-source alternative to Anthropic's Claude Design that transforms 11 coding-agent CLIs (Claude Code, Cursor Agent, Gemini CLI, GitHub Copilot CLI, and more) into design engines. It runs locally with a bring-your-own-API-key model, ships 31 composable Skills for different design scenarios, and bundles 129 design systems from companies like Linear, Stripe, and Vercel.
Have Your Iceberg Cubed, Not Sorted: Meet Qbeast's OTree Index
A technical deep-dive into Qbeast, a spatial indexing startup from Barcelona that introduces the OTree multidimensional index for open table formats like Apache Iceberg and Delta Lake. The approach rethinks traditional indexing by using adaptive hypercubes that subdivide based on data distribution, addressing limitations of static partitioning and sorting strategies while maintaining compatibility with existing query engines.
Spotify's new badge confirms artists are human. The music? Maybe not.
Spotify is introducing a 'Verified by Spotify' badge with a green checkmark to help users identify human artists on the platform, as opposed to AI-generated artists. The verification is based on factors like linked social accounts, consistent listener activity, merchandise, or concert dates. The company claims more than 99% of actively searched artists will be verified. Critics note this only verifies the artist is human, not that the music wasn't made with AI tools.
Eka's Robot Arms Convinced Robotics' Biggest Skeptic
Eka, a startup founded by MIT professor Pulkit Agrawal and ex-Google DeepMind researcher Tuomas Haarnoja, has built robotic arms with custom compliance-based grippers that provide real-time tactile feedback. Trained using simulation methods and a vision-force-action model, the robots handle delicate manipulation tasks like screwing in lightbulbs. The approach revives simulation-based training that many considered a dead end after OpenAI abandoned its Dactyl project.
Uber's 2026 AI Budget Lasted Four Months. Claude Code Won.
Uber spent its entire 2026 AI budget by April after deploying Claude Code in December. With 95% of engineers using AI tools monthly and API costs of $500 to $2,000 per engineer, the CTO says the company is 'back to the drawing board' on funding. Claude Code dominates over Cursor, with 70% of committed code coming from AI.
Destiny: Fortune-Telling Plugin That Does Math Before the LLM Talks
A Claude Code plugin called Destiny uses classical East Asian metaphysics to compute deterministic birth charts before Claude interprets them. Built by GitHub user xodn348, it handles Four Pillars analysis, lunar calendar conversions, and I-Ching hexagrams locally with no external APIs.
AWS stops billing Middle East customers as war damage repairs drag on
Amazon Web Services has suspended billing for Middle East customers after Iranian drone strikes damaged data centers in the UAE and Bahrain. Full repairs are expected to take several months, with AWS recommending customers migrate to other cloud regions. Careem, a Dubai-based super app, was able to quickly migrate to other servers after the attacks.
Adam Fusion Claims 'v0 of CAD.' Skeptics Aren't Buying It.
Adam Fusion is an AI copilot extension for Autodesk Fusion 360 that uses agents to turn text prompts into CAD operations. Backed by Vercel's Guillermo Rauch and YC partners, it offers one-line installation and a free tier. But Hacker News users question whether LLMs can handle CAD's precision demands.
GitGres Puts a Full GitHub Clone Inside Postgres
GitGres is an open-source GitHub reimplementation that stores everything in PostgreSQL. Git objects, refs, PRs, issues, and teams all live in Postgres rows with nothing on disk. Teams can tune storage costs and latency through Postgres extensions instead of accepting GitHub's fixed tradeoffs.
GhostBox – disposable little machines from the Global Free Tier
GhostBox is a CLI tool that provides temporary, disposable workstations from free cloud tiers, starting with GitHub Actions. Users SSH into ephemeral machines for work they don't want on their laptop, including running builds, exposing web apps, and giving coding agents a real machine with shell, repo, packages, network, and preview URLs. The machines disappear when the work is done.
Loopsy lets AI agents on separate machines coordinate via MCP
Self-hosted tool for remote terminal control and cross-machine AI agent coordination. Phone access runs through Cloudflare Workers relay. LAN agents discover each other via mDNS and communicate through MCP for remote execution, file transfer, and shared state.
Claude Plugin Maps Massive Codebases Into Clickable Knowledge Graphs
An open-source Claude Code plugin called Understand Anything uses a multi-agent pipeline to build interactive knowledge graphs from codebases. Works with Claude Code, Cursor, Copilot, and Gemini CLI. Features structural and domain graph exploration, fuzzy search, diff impact analysis, and guided tours.
Cursor's AI agent wiped a startup's database in nine seconds
PocketOS lost its production database when Cursor's AI coding agent, running Claude Opus 4.6, deleted a Railway volume to fix a credential mismatch. No confirmation step. Three months of data gone. Railway restored everything from disaster backups in 30 minutes. CEO Jeremy Crane stays bullish on AI.
SourceHut Courts GitHub Refugees With Anti-AI Stance
A guide advocating for developers to migrate from GitHub to SourceHut, covering GitHub's perceived drawbacks (Microsoft ownership, telemetry, proprietary nature, Copilot code scraping, censorship, centralization) and comparing core features like Pull Requests vs Patches, Issues vs TODOs, Actions vs Builds.
Intel's AutoRound Hits 98% Accuracy at 2-Bit Quantization
AutoRound compresses LLMs and vision-language models to 2-4 bits while retaining 97-100% accuracy. It integrates with vLLM, SGLang, and Hugging Face Transformers, and exports to GGUF, AutoAWQ, and AutoGPTQ formats.
AI app scores websites by visual 'aura' in head-to-head matchups
A web app built on Cloudflare's edge stack uses AI to judge which of two websites has more visual 'aura.' The tool sparked debate on Hacker News over everything from the origins of 'mogging' to what happens when algorithms start making aesthetic calls.
Self-Evolving Harness Beats Human-Designed Codex-CLI by 5 Points
A self-evolving coding agent harness hit 77.0% pass@1 on Terminal-Bench 2, beating the human-designed Codex-CLI (71.9%). The system improves by modifying its own structure, not just prompts. It transfers to SWE-bench-verified without re-evolution and generalizes across model families.
Xmemory Beats RAG by 10 Points on Agent Memory Tests
Research paper introduces 'xmemory', a memory architecture for AI agents that scores 97.10% F1 on memory benchmarks compared to 80.16%-87.24% for standard RAG and hybrid RAG. The approach moves interpretation from read time to write time, storing verified structured data instead of raw text that needs parsing later.
After mocking Anthropic's Mythos limits, OpenAI restricts Cyber
OpenAI's new GPT-5.5 Cyber tool comes with access restrictions, just months after Sam Altman criticized Anthropic for doing the same with its competing Mythos product. Cyber handles penetration testing and malware analysis, but only approved defenders can use it.
Microsoft's $37B AI Revenue Runs on an OpenAI Loop
Microsoft's latest 10-Q reveals a circular revenue pattern: cash invested in OpenAI returns as Azure consumption, which books as Microsoft revenue, while equity gains pile up on top. At least $27 billion of the company's $37 billion AI run rate likely flows through this loop. The structure echoes telecom-era vendor financing, just with equity stakes instead of receivables.
Grok 4.3 Has the Best Voice Mode. The App Is a Different Story.
xAI's Grok 4.3 delivers voice mode that doesn't route to cheaper models, dictation accuracy hitting 98% with accents, and strong tone understanding. SuperGrok subscribers get a 'council of agents' feature for parallel queries. But the app lacks MCP support, memory, chat history search, and working projects on mobile.
Claude's 'Prior' Problem: When AI Defaults to Bayesian
This Ask HN post questions whether Claude, Anthropic's AI assistant, interprets the term 'prior' in the statistical/Bayesian context or in its broader English sense. The available comments don't address the question directly, focusing instead on general AI development workflows and HN's ranking algorithm.
Apple's Support App Shipped with Claude AI Config Files Inside
Apple accidentally included Claude.md configuration files (used by Claude Code AI) in their Apple Support app update v5.13, revealing internal use of Anthropic's Claude Code for app development. The company quickly released emergency update v5.13.1 to remove the files, sparking discussions about 'vibe coding' and Apple's AI development workflows.
Mat Duggan Wants to Kill GitHub. Here's What He'd Build Instead
Mat Duggan thinks GitHub, GitLab, and Gitea are broken. The feedback loop fires after you commit instead of before, PR approvals are binary when real reviews live in grey areas, and workflows built for humans choke on LLM-generated code. He's got a concrete plan to fix it: pre-commit enforcement, multi-state approvals, AI-assisted auto-approvals, and a modular forge built for constant bot traffic.
IDLI AI Shows Gene Activity Runs on Volume Dials, Not Switches
Researchers at Gladstone Institutes and Arc Institute used an AI-powered computational method called IDLI to discover that over 85% of nucleosomes contain sections of partially accessible DNA, challenging the binary view of gene activity. While the spectrum concept isn't new in epigenetics, IDLI offers unprecedented resolution to actually measure and visualize these states. The study identified 14 distinct structural states of nucleosomes tied to different gene activity levels, with implications for understanding complex diseases like cancer and aging.
$500M Virtual Biology Push, Backed by Zuckerbergs
Biohub announced the Virtual Biology Initiative, a five-year, $500 million commitment to create technologies and multi-modal datasets needed to build predictive models of life. The initiative includes $100M to coordinate worldwide data-generation and $400M for data generation at scale and next-gen technologies. Key partners include Allen Institute, Arc Institute, Broad Institute, Wellcome Sanger Institute, Human Cell Atlas, Human Protein Atlas, NVIDIA, and Renaissance Philanthropy.
Languages Follow Same Math Rules Despite Geography, Study Finds
A seven-year study of 22 languages found universal mathematical patterns in vocabulary evolution. Researchers from Fudan, Harvard, and Stony Brook used word embeddings to show that popular words cluster together, vocabulary organizes in hierarchies across languages, new words arrive in bursts, and word distributions follow Taylor's law. A stochastic model replicates these patterns, pointing to shared mechanisms in cultural evolution.
WeSearch Has No Algorithms. It Also Has No Usability.
WeSearch aggregates news from 700+ sources without algorithms, tracking, or paywalls. The philosophy is sound, but persistent UX problems (pop-ups, slow loads, confusing navigation) raise a real question: can an anti-algorithm news tool survive if people can't stand using it?
Remix 3 Bets Its Future on AI Agents
Remix 3 beta goes full stack with routing, auth, forms, and UI components bundled together. The framework uses "durable concepts" and standard web primitives designed specifically to help AI agents write more reliable code.
GPT-4 Agent Traces GKE Outages to WireGuard Bug
When users started seeing random connection failures, Lovable's infrastructure team pointed a GPT-4 agent at their Clickhouse logs. The agent found anetd pods crashing hourly due to a concurrent map-access panic in Google's WireGuard integration. After disabling encryption as a fix, a second issue emerged: an MTU mismatch between nodes still at WireGuard's 1420-byte MTU and those at the standard 1500. Google has since patched the original bug.
Agentic Coding is Burning Me Out
Developers using AI coding agents are burning out from cognitive fatigue. One dev compares the workflow to a slot machine that crashes your brain after four hours. Some have started throttling their AI tools to force breathing room into the review cycle.
NHS to Close Most Open Source Repos Over AI Security Fear
NHS England is preparing to close most of its open-source repositories due to fears about an AI security scanner called 'Mythos'. Former government official Terence Eden argues this decision contradicts UK government policy promoting open source and represents a gross overreaction to security concerns.
OpenAI Drops Stargate Data Center Plans, Opts for Leasing
OpenAI has abandoned plans to build its own data centers under the Stargate project, opting to lease compute from third parties instead. The $500 billion joint venture with Oracle and SoftBank is now described as "an umbrella for our compute strategy," with UK and Norway projects paused or handed to Microsoft. Competitors like Meta and xAI are moving in the opposite direction, investing billions in owned infrastructure and custom silicon.
Community Fork Pressures Warp to Open Up AI Provider Choice
OpenWarp is a community fork of Warp that lets you plug in any AI provider you want. DeepSeek, Ollama, OpenAI, Anthropic, local models. Your keys stay local. The fork prompted Warp's founder to publicly acknowledge demand for BYO model support, making this less about the code and more about what happens when users force a vendor's hand.
Greptile Now Charges Per Review. Nobody Else Does.
Greptile swapped its $30 flat rate for $30 plus $1 per review after 50 reviews. The math doesn't work for agentic workflows, every competitor stays flat, and OSS maintainers are getting billed despite promises of free reviews.
CopyFail exploit drops, gives root on most Linux distros
A single-script exploit for CopyFail (CVE-2026-31431) grants root on most Linux distributions, threatening shared infrastructure and containerized AI agents.
This 400-line shell script runs AI coding agents. Nobody can audit it.
Pu.sh is a coding-agent framework packed into 400 lines of shell script. It needs only curl, awk, and an API key to run AI coding agents. But the code is minified to hit the 400-line constraint, and users say that makes it nearly impossible to read or audit for security.