Four AI math-reasoning milestones in 30 days — AlphaEvolve, FrontierMath Tier 4, WorldReasonBench, OpenAI Erdős — research-automation thesis becomes data
Between April 21 and May 20, 2026, four distinct AI mathematical-reasoning milestones landed: AlphaEvolve's production-record performance, Google DeepMind's FrontierMath Tier 4 solve, WorldReasonBench's video-reasoning benchmark, and OpenAI's Erdős conjecture disproof. The 30-day cadence is the news: the "research automation" framing is no longer speculation, it's a measurable trend.
The four results are different in kind, which is what makes the cumulative argument strong. AlphaEvolve is autonomous-discovery on production combinatorial-optimization problems (CDCL solver kernels, matrix multiplication tilings) where the model wrote code that beat established human-tuned algorithms. FrontierMath Tier 4 is graduate-research-level mathematics solved end-to-end by a Gemini-line model. WorldReasonBench is multimodal video reasoning where the benchmark itself is novel. Erdős is the canonical-conjecture disproof. Different surfaces, same direction.
The downstream policy framing — Jack Clark's Cosmos Lecture two days before this PM cycle — is that recursive self-improvement is now in official research documents at 60%+ probability for end-2028. The four April-May milestones are the data points underneath that probability claim. The mathematics community had been the most skeptical professional cohort about AI's ability to do genuine research; that skepticism is no longer the consensus. The Gowers endorsement is what moved the needle on Erdős; the cumulative four-in-thirty-days is what's moving it on the broader research-automation thesis.
Tech Jacks Solutions — AI Math Results: Four Reasoning Breakthroughs in 30 Days → · OpenAI — OpenAI model disproves discrete geometry conjecture → · Phys.org — AI makes major breakthrough in math problem →