After IMO Sweep, AI Starts Solving Real Math Research

The summer of 2025 changed how mathematicians think about AI. In July, several AI models solved five out of six problems at the International Mathematical Olympiad. That got attention. Mathematicians who had written off AI as too error-prone started experimenting, and they found something surprising: these tools could help with real research, not just puzzles with known answers. Terence Tao, the UCLA mathematician, put it plainly: "2025 was the year when AI really started being useful for many different tasks."

AlphaEvolve, built by researchers at Google DeepMind, uses genetic algorithms to evolve Python programs that optimize mathematical solutions. PatternBoost, described in a recent arXiv paper by Charton, Ellenberg, Wagner, and Williamson, alternates between classical search and transformer networks to find mathematical constructions. It produced a counterexample to a conjecture that had sat open for 30 years. Ernest Ryu used ChatGPT to prove a decades-old conjecture in optimization theory. These aren't toy results. Some are on par with what you'd find in professional mathematical journals.

Akshay Venkatesh, a Fields Medalist at the Institute for Advanced Study, worries about what gets lost when mathematicians lean on AI. "There are valuable things in our culture which we should try to keep," he said. AI agents risk bypassing genuine understanding, as the tools accelerate output without the necessary mental development. Researchers are leaving academia for OpenAI, Google, and startups like Harmonic and Axiom Math. Journals are scrambling to update policies. Elsevier and Springer Nature now require authors to disclose AI use. Journals are also facing new challenges with AI-generated citations and fake references. The journal Experimental Mathematics, long a home for computer-aided proofs, faces a new challenge: verifying results from non-deterministic AI models rather than deterministic code. Some are pushing for formal proof assistants like Lean or Coq to verify AI-generated logic before publication.

Tao compares the current moment to having "a shovel and a pickax" where before you had bare hands. The models let mathematicians solve thousands of problems at once and run statistical studies that were previously impossible. A February 2026 challenge called First Proof had AI models solve over half of 10 research-level questions in a week. The questions were specifically chosen to be unlikely in training data, meaning the models couldn't rely on memorization. Daniel Litt, who analyzed those results, wrote: "It's very likely that this technology is bigger than the computer." Tao sees the tools as augmentation, not replacement. But the practice of mathematics is about to look very different.