Four math milestones in thirty days — research automation graduates from speculation to data
Between April 21 and May 20, four distinct AI mathematical-reasoning milestones landed: AlphaEvolve's production records, FrontierMath Tier 4 solve, WorldReasonBench, and OpenAI's Erdős disproof. The cadence is the news. Research automation is no longer speculation; it's a measurable trend with publication-grade artifacts.
The skepticism through 2024 was about whether AI could do research at all — whether the "reasoning" outputs of frontier models reflected genuine novel mathematical work or sophisticated retrieval from training data. Four results in thirty days end that debate. The remaining debate is about pace, not direction.
Four AI math-reasoning milestones in 30 days: AlphaEvolve's production records on combinatorial-optimization problems (CDCL solver kernels, matrix multiplication tilings); Google DeepMind's FrontierMath Tier 4 graduate-research-level solve; WorldReasonBench's multimodal video-reasoning benchmark; OpenAI's disproof of the 80-year-old Erdős unit-distance conjecture, endorsed by Tim Gowers as "a milestone in AI mathematics."
Why the cadence matters more than any single result
One milestone is a press release. Four milestones in thirty days from four different labs on four different mathematical surfaces is a phase transition. The labs are not coordinating; they are converging because the underlying capability has crossed a threshold that lets multiple research programs produce comparable results simultaneously. The threshold appears to be around the Gemini 3 / Claude Opus 4.7 / GPT-5.5 capability tier.
The Gowers companion paper extending the Erdős construction beyond the unit-distance case to a broader family of combinatorial-geometry problems is the strongest single signal. Gowers is the most credible Fields medalist who has engaged seriously with AI-assisted mathematics through the AlphaProof and AlphaGeometry collaborations in 2024-2025. His "milestone" framing is community consensus, not press-release rhetoric.
What this implies for the research-automation thesis
Jack Clark's Cosmos Lecture two days before this PM cycle put a 60%+ probability on recursive self-improvement by end-2028 (covered AM cycle). The four April-May math milestones are the underlying data points. If frontier models can produce publication-grade mathematics in four distinct sub-fields within a month, the "model trains its successor" framing is no longer a hypothetical scenario; it's a near-term operational question.
The next twelve months will probably produce: more open-conjecture resolutions across mathematics, the first independently-discovered new theorems in major sub-fields (algebra, number theory, combinatorics) credited to AI systems, and at least one Fields-medal-level result whose discovery process explicitly includes AI as a collaborator rather than a tool. Each will be controversial in the mathematics community for distinct reasons; each will move the consensus further from "AI doesn't do research" toward "AI is research infrastructure."
The policy framing that hasn't caught up
If research automation is the trajectory, the regulatory architecture has to think about the implications. AI systems contributing to publishable mathematics raises questions about authorship attribution, peer review of AI-generated proofs (especially in areas like number theory where the proofs are too complex for full human verification), and the funding model for academic mathematics in a world where the marginal cost of producing a new mathematics paper drops by an order of magnitude.
None of these questions have institutional answers. They will accumulate over the next two to three years as the field works out new norms. The mathematics community moves slowly on norms changes; the AI capability is moving faster. The friction is going to be visible.
OpenAI — OpenAI model disproves discrete geometry conjecture → · Phys.org — AI breakthrough in math problem → · Tech Jacks Solutions — AI Math Reasoning Milestones 30 Days →