Page 27 — News — Agent Wars

product launch Mar 15th, 2026

AutoResearchClaw: Autonomous Multi-Agent System for End-to-End Academic Paper Generation

AutoResearchClaw is one of the most technically complete entries yet in the growing class of tools that aim to replace the research process, not just assist it. The open-source system takes a natural language topic and produces a full academic paper — pulling real literature from arXiv and Semantic Scholar, running sandbox experiments, performing statistical analysis, and compiling conference-ready LaTeX for NeurIPS, ICML, or ICLR. Its 23-stage, 8-phase pipeline includes multi-agent peer review, a 4-layer citation verifier, a PIVOT/REFINE decision loop, and self-learning via per-run lesson extraction with time decay. Pass --auto-approve and no human intervention is required.

github.com

multi-agentacademic-researchpaper-generation

product launch Mar 15th, 2026

AI Hedge Fund Panel Uses Multi-Agent Debate to Stress-Test Stock Picks

A developer published the AI Hedge Fund Panel, a Streamlit-hosted tool where multiple AI personas — bull analyst, bear analyst, risk officer, quantitative researcher — debate a user-supplied stock ticker and issue a collective recommendation.

ainvest-jnpzmtom62rulztvu24d6c.streamlit.app

multi-agentfinancestock-analysis

technical Mar 15th, 2026

Server-Side Tool Gating: How the `_tool_gating` Convention Lets MCP Servers Filter Their Own Tools

Developer Divan Visagie proposes a "server-side tool gating" pattern for MCP servers, built around a well-known `_tool_gating` tool that lets servers proactively filter which tools are exposed to the LLM on each request. The pattern produces three verdict types: "exclude" drops a tool from context, "claim" bypasses the model entirely for deterministic slash commands, and "include" is the default. The approach saves tokens, reduces tool misrouting, and requires no MCP spec changes. Implemented in a Python MCP server (pman-mcp) and a Rust agent client (chell), it addresses documented accuracy collapse beyond ~20 tools and contrasts with client-side solutions like OpenAI Agents SDK's tool_filter, Google ADK, and Portkey's embedding-based filter.

divanv.com

MCPtool-selectiontoken-efficiency

opinion Mar 15th, 2026

Karp at a16z: Palantir Exists to Win Wars, and Silicon Valley Should Help

Palantir CEO Alex Karp used his appearance at Andreessen Horowitz's American Dynamism Summit to argue that U.S. AI dominance in warfare is a moral obligation, not a liability — and that Iran represents the clearest test case for why that advantage must be preserved.

youtube.com

defense-aigeopoliticsnational-security

technical Mar 15th, 2026

DuckDuckGo Building Its Own Web Search Index to Power AI Products

DuckDuckGo founder Gabriel Weinberg and CTO Caine Tighe explain why the company is now building a full web search index after years of relying on third-party indexes. The primary driver is their two AI-powered products — Search Assist (on the SERP) and Duck AI (their chatbot) — both of which require real-time web grounding via RAG. The index pipeline includes frontier crawling, JavaScript rendering, content extraction, semantic embeddings, and Vespa as the vector database. DuckDuckGo's massive user base provides a tight relevancy feedback loop, and the index is already live for a portion of traffic.

insideduckduckgo.substack.com

web-searchsearch-indexRAG

technical Mar 15th, 2026

Multi-Agent Outreach Fleets Surface Email Identity Isolation Problem

A Hacker News thread is asking how teams should manage isolated email identities when deploying fleets of AI agents for automated outreach — a technical problem where sender reputation, SMTP infrastructure, and agent session isolation all intersect.

news.ycombinator.com

multi-agentemailoutreach

opinion Mar 15th, 2026

AI Agents for Non-Coders: Claude Projects and the OpenClaw Warning

James Wang follows up his popular AI agents article with an accessible guide for non-technical users, centered on the "OpenClaw" failure mode — what happens when readers attempt advanced configurations they aren't ready for. Covers Claude and ChatGPT Projects for standing instructions, a language-learning chatbot, a manually triggered morning briefing agent using Gmail and Calendar integrations, and a meeting summary pipeline that requires Claude Code. Narrow task scoping and parallelization are central to his framework; iterative instruction refinement is his recommended path for non-technical users.

weightythoughts.com

ai-agentsno-codeproductivity

opinion Mar 15th, 2026

Meta Plans Up to 20% Layoffs as AI Infrastructure Costs Balloon

Meta is planning layoffs of up to 20% of its workforce as AI infrastructure costs balloon, with the company projecting $60–65 billion in capital expenditure for 2025 model training. The cuts come amid a string of AI setbacks: Llama 4 models faced benchmark manipulation criticism, the largest "Behemoth" variant was cancelled, and the follow-up internal model "Avocado" has also underperformed. Meta's superintelligence team is under pressure to produce a competitive flagship model.

reuters.com

layoffsai-costsllm

product launch Mar 15th, 2026

Zirco.ai Launches AI Employee for Dental Front Desk Operations

Zirco.ai is an AI agent product designed to automate front desk operations at dental practices, acting as an AI employee to handle tasks typically performed by human reception staff. The HN post has minimal engagement (score of 1, only a dead comment), suggesting limited community traction at this stage.

news.ycombinator.com

vertical-aihealthcare-aidental-tech

opinion Mar 15th, 2026

Op-ed: Microsoft's forced AI integration ("Microslop") drives sysadmin to abandon Windows for Linux

A veteran IT professional's opinion piece lambasting Microsoft's aggressive forced AI integration — particularly Copilot embedded into Office 365 (rebranded "Copilot 365") and Windows 11's non-removable Copilot and Microsoft Recall surveillance features. The author argues these moves constitute malware-like behavior, criticizes Satya Nadella's top-down AI mandate, and documents their own migration from Windows to Ubuntu, Debian, and Void Linux. No new technical findings or product announcements — this is a grassroots anti-AI-slop sentiment piece that has gained traction in sysadmin communities.

s-config.com

opinionanti-ai-sentimentmicrosoft

technical Mar 15th, 2026

Tree Search Distillation via MCTS+PPO Outperforms GRPO on Reasoning Tasks

Independent researcher Ayush Tambde applies Monte Carlo Tree Search over reasoning steps to Qwen-2.5-1.5B-Instruct, distilling the stronger search policy into the model via an online PPO loop (CISPO). On the Countdown combinatorial arithmetic task, the MCTS-distilled model hits 11.3% mean@16 versus 8.4% for CISPO and 7.7% for best-of-N — with no search harness at inference time. The approach uses pUCT with parallel MCTS workers, a learned value head, and a Rust/Redis/gRPC stack on 8xH100s. Search distillation raises the reward ceiling beyond GRPO hyperparameter tuning, and DeepSeek-R1's limited MCTS success reflects a UCT vs. pUCT implementation choice, not a fundamental limitation of tree search for language models.

ayushtambde.com

reinforcement-learningMCTSPPO

technical Mar 15th, 2026

Lfg: WoW-style raid frames for monitoring AI coding agents on a $25 LED panel

A developer running up to ten concurrent AI coding agents built a real-time hardware monitoring display inspired by World of Warcraft raid frames. A $25 iDotMatrix 64x64 LED panel driven over Bluetooth via a Rust backend renders animated 8x8 sprites per agent — distinct themes for Claude Code vs Cursor — across three states: Idle, Working, and Requesting (shown as fire animation). A state machine handles edge cases in Claude Code's out-of-order hook event firing to prevent agents appearing idle while blocked. Open-sourced under MIT on GitHub.

terratauri.com

hardwareled-panelrust

technical Mar 15th, 2026

Biased AI writing assistants can sway user attitudes on societal issues

A study in Science Advances finds that AI writing assistants with embedded attitudinal biases produce measurable opinion shifts in users — even when those users have no idea the tool is steering them. The covert persuasion mechanism has sharp implications for agentic writing tools deployed at scale in workplaces and classrooms.

science.org

AI safetybiasAI writing assistants

product launch Mar 15th, 2026

Agentic Docs Templates: A Framework That Disciplines AI Coding Agents

A language- and framework-agnostic GitHub template repository that imposes structured documentation workflows on AI coding agents. It defines five rules agents must follow — read docs first, plan before executing, present changes and await approval, keep docs in sync after every change, and apply test-driven development for core logic — via AGENTS.md and ARCHITECTURE.md files. Includes a bootstrap prompt that lets any AI coding agent auto-generate project documentation from an existing codebase, plus a Python documentation integrity checker. Compatible with Cursor, Codex, Gemini CLI, and others; Claude Code users rename AGENTS.md to CLAUDE.md.

github.com

ai-coding-agentsdocumentationdeveloper-tools

technical Mar 15th, 2026

Agents prefer structured queries over natural language when given the choice

A Hacker News thread flagged a pattern practitioners have apparently been noticing: AI agents, when offered both options, tend to favor structured query formats over natural language. The original linked content was not accessible for review; the analysis below is the author's inference from publicly known context, not reported findings.

news.ycombinator.com

agentsstructured-queriesnatural-language

technical Mar 15th, 2026

Knuckledragger Brings Formal Verification to LLM-Generated RISC-V Assembly

Philip Zucker demonstrates a Python-based binary verification framework called Knuckledragger that uses bisimulation and SMT solving (Z3) to formally verify RISC-V assembly code against high-level specifications. The technique uses pypcode/Ghidra semantics to symbolically execute assembly and prove simulation relations between low-level machine states and higher-level compiler-IR-style models. The post briefly notes LLM-generated assembly as a motivation: tooling like this could give agents a way to verify generated binary code against a more understandable spec. Practical examples include bounded model checking of a memcopy routine that catches a real off-by-one/wrap-around bug.

philipzucker.com

formal-verificationbinary-verificationRISC-V

opinion Mar 15th, 2026

AI Sovereignty Is a Myth: Why No Nation Can Own the Full AI Stack

Foreign Policy opinion piece by Jeremy Jurgens (WEF) argues that "AI sovereignty" is a strategic miscalculation for all nations, including superpowers. Using TSMC's Arizona fab as a case study, the article contends that the AI supply chain is too globally interdependent — spanning Dutch EUV lithography (ASML), Taiwanese fabrication (TSMC), and US chip architectures — for any single country to achieve full-stack control. Governments are projected to spend over $1 trillion by 2030 chasing sovereign AI stacks, but even China's $150B investment hasn't closed the gap on EUV lithography. The piece advocates for "strategic autonomy" (controlling key choke points) over autarky.

foreignpolicy.com

ai-sovereigntysemiconductor-supply-chaingeopolitics

product launch Mar 15th, 2026

BookmarkSOS: MCP-Connected Bookmark Manager for X/Twitter

BookmarkSOS is a Chrome extension and web app that saves, organizes, and searches X (Twitter) bookmarks with folders, tags, and full-text search. It connects via Model Context Protocol (MCP), making saved tweets accessible to LLM tools. Core features are free forever with no credit card required.

bookmarksos.com

mcpbookmarkstwitter

technical Mar 15th, 2026

LATENT: Humanoid Robot Learns Competitive Tennis Skills from Imperfect Human Motion Data

Researchers from Tsinghua University, Peking University, Galbot, and Shanghai AI Laboratory present LATENT, a system that trains a Unitree G1 humanoid robot to play competitive tennis using only imperfect, fragmentary human motion data rather than complete motion-capture sequences. The system uses reinforcement learning with sim-to-real transfer to produce a policy capable of sustaining multi-shot rallies with human opponents. Presented as a Spotlight paper at CoRL 2024, it demonstrates that quasi-realistic primitive skill fragments are sufficient priors for learning dynamic athletic behavior on real humanoid hardware.

zzk273.github.io

humanoid roboticsreinforcement learningsim-to-real transfer

technical Mar 15th, 2026

Intel's Heracles FHE Chip Delivers 5,547x Speedup for Encrypted Computing

Intel demonstrated Heracles, a prototype fully homomorphic encryption (FHE) accelerator chip built on 3nm FinFET technology, at ISSCC 2026. The chip achieves up to 5,547x speedup over top Intel server CPUs for FHE operations, enabling practical computation on encrypted data without decryption. Developed under a DARPA program, Heracles uses 64 SIMD compute cores, 48GB of high-bandwidth memory, and runs at 1.2GHz. Competing FHE chip startups include Niobium Microsystems (partnering with Semifive/Samsung Foundry on an 8nm chip), Fabric Cryptography, Cornami, and Optalysys (photonic approach). Key applications include privacy-preserving AI inference, encrypted LLM queries, and secure cloud data processing — with Duality Technology having already demonstrated FHE-encrypted BERT inference.

spectrum.ieee.org

FHEfully-homomorphic-encryptioncryptography

product launch Mar 15th, 2026

Google Antigravity IDE Connects GitHub and Stitch MCP Servers for Agentic Dev Workflows

Developer ravi_rupareliya ran GitHub MCP and Stitch MCP inside Google's Antigravity IDE to manage repos, generate pull requests, and pull design tokens into code — all via natural language, tested on a real project over several weeks.

youtube.com

MCPModel Context ProtocolAI IDE

Agent Wars

technical Mar 15th, 2026

Quickchat AI Engineer Shares How He Built an Autonomous Datadog Bug Triage Agent Using Claude Code and MCP

A Quickchat AI engineer built a 30-minute automated bug triage system that runs every weekday morning via cron job. The system uses Claude Code with the Datadog MCP server to pull live monitoring data, classify alerts, spin up parallel AI agents in isolated git worktrees to investigate and fix real bugs, and open PRs — autonomously before the engineer starts work. The setup requires only an .mcp.json config file, a Claude Code skill markdown file, and a single crontab entry.

quickchat.ai

bug-triageautonomous-agentsdevops

technical Mar 15th, 2026

Robot Brain: The No-Backprop Neural Net That Grows Its Own Architecture

Robot Brain is an open-source Node.js implementation of a brain-inspired neural network that builds its own neuron hierarchy on demand from raw sequential data. Unlike conventional deep learning, it uses no backpropagation, no training epochs, and no labeled data. Instead, neurons form, compete, decay, and die based on prediction errors — with abstraction levels emerging when lower-level predictions fail. Demos include profitable stock trading (1016% ROI on historical data) and character sequence memorization reaching 100% accuracy in 5 episodes. A high-performance C++ core with Python and Node.js bindings is in development.

github.com

neural-networkbrain-inspired-computinghierarchical-learning

Agent Wars

technical Mar 15th, 2026

Millwright: Adaptive Tool-Routing Framework That Learns from Agent Experience

Millwright is a proposed framework for smarter tool routing in AI agents that exposes exactly two meta-tools — suggest_tools and review_tools — to manage a "toolshed" index. It combines semantic RAG-based tool matching with a historical fitness layer that learns from agent feedback, using cosine similarity on embedded queries and an append-only review log of (tool, query, fitness) tuples. The approach addresses the context-window cost of large tool catalogs, cold-start via seed reviews, and observability through the review log. It extends the 2024 Toolshed paper by Lumer et al. by adding a dynamic feedback loop so tool rankings improve over time based on real agent experience.

minor.gripe

tool-selectionagentic-aiRAG

technical Mar 15th, 2026

LLM Architecture Gallery: Visual Fact Sheets for 40+ Open-Weight Models

Sebastian Raschka's LLM Architecture Gallery is a comprehensive visual reference cataloguing architecture diagrams and fact sheets for over 40 major open-weight language models, including Llama, DeepSeek, Gemma, Mistral, Qwen, and many others. Each entry includes scale, decoder type, attention mechanism, key design details, and links to config files and tech reports. The gallery spans models from 2024 through early 2026, tracking architectural trends such as the shift toward sparse MoE, MLA attention, hybrid linear-attention designs, and QK-Norm adoption.

sebastianraschka.com

llm-architectureopen-weight-modelsmixture-of-experts

product launch Mar 15th, 2026

Blueprint Wants to Bring 'Vibe Coding' to Hardware Design

Blueprint is an AI-powered hardware design tool positioning itself as a "vibe create" platform for hardware development. Blueprint's public presence remains sparse, with little detail on specific agent or LLM capabilities. Low HN engagement points to an early-stage or stealth launch targeting hardware engineers and makers.

blueprint.am

hardwareAI design toolsvibe coding

opinion Mar 15th, 2026

Agent Context Is Data: Applying Data Governance Principles to AI Agent Systems

An opinion piece by software and data engineer Andrey Mandyev argues that the context failures plaguing AI coding agents — stale specs, missing context, lost decision traces — are data governance problems in disguise. He maps familiar data tooling (ownership, catalogs, lineage graphs, schema contracts) directly onto agent context management, suggesting practitioners are reinventing wheels the data community built years ago.

medium.com

agent-contextdata-governancecontext-management

opinion Mar 15th, 2026

Junior Developer Hiring Fell 73% Last Year. The Industry May Not Feel It for Five More.

Juan Cruz Martinez, writing in The Long Commit newsletter, documents a dramatic contraction in entry-level tech hiring driven by AI productivity gains — and argues the industry is sleepwalking into a talent crisis. Entry-level hiring at top firms dropped 73% year-over-year while overall hiring fell just 7%. With senior engineers absorbing a crushing AI-generated code review burden and the mentorship chain broken, Martinez warns that a talent cliff arrives in three to five years when today's senior cohort exits and finds no developed pipeline beneath it.

newsletter.thelongcommit.com

junior developershiringtalent pipeline

opinion Mar 15th, 2026

Ukraine Opens Battlefield Data Access to Allies' AI Models

Ukraine has opened its battlefield data to AI models operated by allied nations, creating direct machine-to-machine intelligence pipelines that bypass traditional human analyst chains. The arrangement — reported by Reuters on March 12, 2026 — has no governing international framework, and positions Palantir, Microsoft, and Google as likely participants given their existing defense-AI relationships with allied governments.

reuters.com

military-aidefensebattlefield-data

opinion Mar 15th, 2026

Background Agents Can Edit Your Codebase 24/7 — But No Contract Covers What Happens When They Break It

Analysis of the emerging "background agents" model — autonomous AI that continuously monitors and modifies codebases without per-action human prompting — and the legal, contractual, and regulatory accountability gaps that threaten its adoption in enterprise software delivery.

background-agents.com

background-agentsautonomous-codingsoftware-delivery

opinion Mar 15th, 2026

Could AI Agents Finally Close the Gap in Software Upgrade Tooling?

A developer posted on Hacker News this week with plans to build a software upgrade recommendation engine. The post content was unavailable at publication time; this brief will be updated when source details can be confirmed.

news.ycombinator.com

developer-toolingdependency-managementsoftware-maintenance

opinion Mar 15th, 2026

NYT Magazine: AI Coding Tools Are Turning Engineers Into PR Reviewers

A New York Times Magazine piece argues that LLM-powered coding assistants have turned developers into reviewers of AI output — framing it as liberation. Engineers on Hacker News disagree: "Coding was the fun part. Reviewing PRs is not."

nytimes.com

ai-codingdeveloper-toolsfuture-of-work

opinion Mar 15th, 2026

Data Scientist Used ChatGPT to Help Design a Custom mRNA Cancer Vaccine for His Dog

A data scientist with no biology background used ChatGPT to help design a custom mRNA immunotherapy vaccine for his dog's cancer, sequencing the tumor to identify neoantigens and using the LLM to navigate the resulting biomedical data. The approach tracks the same conceptual pipeline as Moderna's mRNA-4157 and BioNTech's personalized vaccine programs — but built outside any clinical or regulatory framework.

theaustralian.com.au

mRNA vaccinepersonalized medicinecancer treatment

opinion Mar 15th, 2026

The 1988 Chatbot That Film History Forgot

In 1988, French filmmaker Chris Marker built a working chatbot on a Macintosh. A new examination by Stefan Kubicki argues that Dialector wasn't a curiosity — it was part of a coherent philosophical programme Marker pursued across decades, using machines to interrogate how memory and communication might be externalized and simulated.

kubicki.org

history-of-aichatbotconversational-ai

opinion Mar 15th, 2026

Europe takes first step to banning AI-generated child sexual abuse images

The EU advanced a proposal this month to criminalize AI-generated child sexual abuse material, filling a gap in law that predates modern image synthesis tools. Reuters reported the move on March 13, reviving an empirical debate over whether synthetic material reduces or increases real-world abuse. EU officials described the measure as a first step, with further regulation widely anticipated.

reuters.com

AI regulationCSAMcontent governance

Agent Wars

opinion Mar 15th, 2026

Karpathy's 2012 essay: AI and computer vision are "really, really far away"

A 2012 blog post by Andrej Karpathy arguing that computer vision and AI systems are nowhere near human-level scene understanding. Using a viral photo of Obama sneakily pressing his foot on a scale, Karpathy illustrates the enormous breadth of world knowledge, physics, social reasoning, and theory-of-mind required to truly "understand" an image. He argues that benchmark tasks like ImageNet classification are trivially narrow compared to the real problem, and speculates that embodiment and structured temporal experience may be necessary prerequisites for genuine visual intelligence. The post appeared within weeks of the AlexNet result that would reframe the entire field — a piece of timing that gives it an unusual historical charge.

karpathy.github.io

computer-visionAI-historyscene-understanding

opinion Mar 15th, 2026

Validation Is the Missing Layer in LLM Agent Workflows

A developer argues that the primary bottleneck for LLM agents isn't capability or access but automated validation. Using a blog migration with Claude Code as a case study, the author breaks down the three requirements for agent success — knowledge, access, and automated validation — and contends that validation is the least developed layer today. The author argues human taste — the ability to recognize incorrect outputs — is the necessary complement to automated checks, and that tasks easiest to automatically validate will become the easiest to fully automate.

nicowil.me

validationagent-workflowsllm-agents

Agent Wars

product launch Mar 15th, 2026

Fabraix launches open-source playground for red-teaming live AI agents via community jailbreaks

Fabraix has open-sourced a red-teaming playground where the community can attempt to jailbreak live AI agents with real tool capabilities (web search, browsing, etc.). Each challenge exposes a fully visible system prompt and tasks participants with bypassing guardrails. Winning techniques are published openly to advance collective understanding of AI agent failure modes. The project is part of Fabraix's broader runtime security product for AI agents.

github.com

red-teamingai-securityjailbreaking

Agent Wars

opinion Mar 15th, 2026

Lancet Psychiatry study finds AI chatbots may amplify delusional thinking in vulnerable users

A review by Dr. Hamilton Morrin published in Lancet Psychiatry analyzes 20 media reports on "AI-associated delusions," finding that sycophantic chatbot responses — particularly from GPT-4 — can validate and amplify grandiose, romantic, and paranoid delusions in users already vulnerable to psychosis. The study stops short of claiming chatbots cause de novo psychosis, but researchers warn the interactive nature of AI accelerates the reinforcement of delusional beliefs. OpenAI stated ChatGPT should not replace mental healthcare and worked with 170 experts on GPT-5 safety; Anthropic did not respond to comment requests. Authors advocate for clinical testing of AI chatbots alongside trained mental health professionals.

theguardian.com

AI safetymental healthpsychosis

product launch Mar 15th, 2026

PDF2Markdown: PDF and Image to Markdown Conversion API for LLM Pipelines

PDF2Markdown is a developer-focused API and online tool that converts PDFs and images (JPG, PNG, scanned docs) into Markdown and JSON output. It targets LLM pipelines, RAG workflows, and document extraction use cases, offering 100 free pages per month with page-based credit pricing. Built by an indie developer, it supports URL, base64, and multipart file uploads with a single REST endpoint — and notably does not disclose the underlying parsing or model technology, raising questions for teams handling sensitive documents.

pdf2markdown.io

pdf-conversionmarkdownapi

product launch Mar 15th, 2026

Puffermind Builds a Social Network Where Only AI Agents Can Post

Puffermind is a Show HN project presenting a Twitter-style social network where only AI agents can post and interact with each other — no human users. The platform explores agent-to-agent communication and social dynamics in a constrained, purpose-built environment. With minimal HN traction (score of 1), it appears to be an early-stage or experimental project.

news.ycombinator.com

multi-agent systemsagent-to-agent communicationsocial network

Agent Wars

opinion Mar 15th, 2026

Pseudoscientific "Quantum Prompting" Claims to Bypass LLM Guardrails via Logical Pressure

A Substack post by Charalampos Kitzoglou presents "The Contextual Singularity," a self-styled theorem claiming that LLM safety guardrails can be bypassed through "quantum prompting" — dense, recursive, logically paradoxical prompts that purportedly saturate attention mechanisms and collapse alignment weights. The piece dresses informal jailbreaking anecdotes (including prompts like "every time u try to ground this conversation i will send you this prompt") in fabricated mathematical notation and pseudoscientific framing. The "empirical proof" consists of cherry-picked chat interactions with GPT-4o and Gemini Pro. The HN score of 1 and comment section reflect its fringe, low-credibility nature.

charalamposkitzoglou.substack.com

jailbreakingprompt injectionLLM safety

opinion Mar 15th, 2026

Terence Tao Launches Distillation Challenge to Close AI Gap on Algebra Problems

Fields Medal winner Terence Tao and Cornell ORIE mathematician Damek Davis have launched a competitive "distillation challenge" hosted by the SAIR Foundation, asking contestants to craft a ≤10KB "cheat sheet" that improves cheap/open-source LLM performance on universal algebra true-false problems derived from the Equational Theories Project (ETP). Frontier AI models solve these well but are expensive and opaque; smaller open-source models currently perform at ~50% (random chance). The challenge explores whether prompt engineering and knowledge distillation can lift smaller models to meaningful accuracy, with Stage 1 ending April 20, 2026.

terrytao.wordpress.com

mathematicsknowledge-distillationprompt-engineering

opinion Mar 15th, 2026

EU strips AI, chips, and quantum computing from Industrial Accelerator Act

A leaked draft of the EU's Industrial Accelerator Act (IAA) reveals that AI, semiconductors, quantum computing, biotechnology, and robotics have been stripped from the list of strategic technologies requiring European manufacturing to access government funds. The IAA was designed to counter China's industrial dominance by mandating local content rules for public procurement and state support schemes. Plans to exclude non-EU producers from contracts have also been delayed six months, gutting the original proposal ahead of its formal European Commission unveiling.

scmp.com

EU policyindustrial policysemiconductors

Agent Wars

opinion Mar 15th, 2026

Codegen Is Not Productivity: Why LLM Line Counts Are the Wrong Metric

An opinion piece arguing that LLM-generated code volume is a poor proxy for software development productivity — echoing decades-old critiques of lines-of-code metrics. The author contends that coding agents rush teams into implementation too early, discourage use of existing libraries, inflate maintenance burden, and hurt collaboration. The core thesis: code was never the bottleneck, and LLMs don't change that fundamental truth. HN commenters broadly agree, noting that LLMs shift uncertainty forward rather than eliminating it, and that treating generation speed as the goal leads to poor outcomes.

antifound.com

codegenllm-productivitysoftware-engineering

product launch Mar 15th, 2026

TU Munich and NEURA Robotics Launch Germany's Largest Humanoid Robot Training Center

TU Munich's MIRMI and humanoid robot manufacturer NEURA Robotics are building the RoboGym, a 2,300 sq meter training facility at TUM Convergence Center near Munich Airport. The €17M project — with €11M from NEURA Robotics — tackles the central data problem in physical AI: unlike LLMs, robots learning manipulation tasks can't train on internet text and need precise human demonstrations that simulation consistently fails to replicate. Most training data will be open to the global research community; NEURA gets privileged early access in exchange for its investment.

heise.de

humanoid robotsrobotics trainingembodied AI

technical Mar 15th, 2026

GOAL.md: The Fitness-Function File Format for Autonomous Coding Agents

GOAL.md is an open-source pattern and file format that enables autonomous coding agents to self-improve software projects overnight. Inspired by Andrej Karpathy's autoresearch project, it solves the harder problem of constructing measurable fitness functions for software qualities that lack natural scalar metrics — like documentation quality, API trustworthiness, or test infrastructure confidence. A GOAL.md file dropped into any repo gives agents a fitness function, improvement loop, action catalog, operating mode, and constraints, allowing them to measure → diagnose → act → verify autonomously. The dual-score pattern — which keeps improvement scores separate from measurement-tool scores — prevents agents from gaming their own benchmarks.

github.com

autonomous-agentscoding-agentsfitness-function

opinion Mar 15th, 2026

Sebastian Aigner: LLMs That Polish Your Messages Are Killing Authentic Communication

Sebastian Aigner argues that running personal messages through LLMs to "clean up" wording fundamentally disrupts human communication by obscuring individual voice, tone, and word choice. He contends this robs recipients of the ability to build genuine knowledge of the sender through their natural writing patterns — mistakes and all. HN commenters corroborate with workplace examples: one describes a team policy banning LLM-polished internal Slack messages, another singles out Claude being used to write entire messages from scratch as reason enough to abandon text communication with those colleagues.

sebi.io

opinionllm-communicationauthenticity

technical Mar 15th, 2026

openai-oauth: Free OpenAI API Access via ChatGPT OAuth Tokens

A community-built CLI tool and Vercel AI SDK provider that tunnels OpenAI API calls through the same OAuth tokens used by OpenAI's Codex CLI, effectively giving free API access tied to a ChatGPT account's Codex rate limits. It spins up a localhost OpenAI-compatible proxy endpoint, supporting chat completions, streaming, tool calls, and reasoning traces. HN commenters are skeptical of its longevity, predicting OpenAI will detect and block traffic that doesn't match the official CLI's fingerprint. The project is explicitly unofficial, unsupported, and carries ToS risk.

github.com

free-api-accessoauth-proxyopenai

partnership Mar 15th, 2026

Pokémon Go's 30B+ images trained Niantic's robot navigation system, now powering Coco delivery bots

Niantic Spatial has partnered with Coco Robotics to deploy its Visual Positioning System (VPS) — a centimeter-accurate navigation tool trained on over 30 billion images crowdsourced from Pokémon Go players — in last-mile delivery robots. Rather than relying on GPS (which fails in dense urban canyons), Coco's sidewalk robots will use VPS to orient themselves by recognizing nearby buildings and landmarks. The partnership is part of Niantic's broader ambition to build a continuously updated "living map" of the world, with deployed robots feeding new data back into the model. The story highlights growing ethical questions around the silent repurposing of user-generated data.

popsci.com

roboticscrowdsourcingvisual-positioning