ai-models

GPT-5.4 Pro Cracks a 7-Year-Old Unsolved Math Problem — FrontierMath Scores Hit 50%

Anatoliy Kolodkin

25 Mar 2026 • 1 min read

For seven years, a Ramsey-style hypergraph conjecture posed by mathematicians Will Brian and Paul Larson sat unsolved — resisted by human experts and earlier AI systems alike. That changed in March 2026, when OpenAI's GPT-5.4 Pro cracked it. Epoch AI independently verified the solution and confirmed it is genuinely novel, not a retrieval or recombination of known results. The mathematicians plan to publish the proof, and they're weighing whether to credit the AI's contribution in the paper itself.

The broader context makes this milestone even more striking. When Epoch AI launched the FrontierMath benchmark in 2024 — a suite deliberately designed to resist brute-force AI — GPT-4 scored roughly 5%. GPT-5.4 Pro now scores 50%, and it isn't alone at the frontier: Gemini 3.1 Pro and Claude Opus 4.6 (max) can solve the same conjecture at least some of the time. What was once a clean dividing line between "AI as calculator" and "AI as mathematician" has blurred considerably, and the pace of that blurring — from 5% to 50% in under two years — suggests the line may soon vanish entirely.

The implications reach well beyond competitive benchmarks. Peer-reviewed mathematics has historically been one of the domains most resistant to the "AI accelerating science" thesis, requiring creative leaps that pattern-matching systems were assumed to lack. A verified, first-ever solution to an open research problem challenges that assumption directly. Whether this is an inflection point or an outlier is the question the research community is now actively debating.

Read the full article at Epoch AI →

Sign up for more like this.