AI for Mathematics: 2025–2026 Digest

A curated guide to recent papers, news, and developments in AI-assisted mathematics.

Brief overview

Recent work in AI for mathematics is developing along several connected directions. One major area is autoformalization: using AI to translate ordinary mathematical writing, such as theorem statements, LaTeX papers, and textbooks, into code that a proof assistant can check. A second area is AI-assisted theorem proving, where models try to prove formal statements, receive feedback from systems such as Lean, Isabelle, or Rocq/Coq, and then repair their proofs.

The field is also moving beyond contest-style problem solving. Some projects now aim at research-level mathematics: discovering useful constructions, proposing or resolving conjectures, searching for counterexamples, and verifying difficult proofs. At the same time, formalization has expanded into serious mathematical areas such as analysis, topology, number theory, mathematical physics, combinatorics, category theory, and statistical learning theory.

Evaluation is becoming more realistic as well. New benchmarks test AI systems not only on Olympiad-style problems, but also on unfinished Lean projects, recent arXiv papers, applied mathematics, and graduate-level proofs. The ecosystem around AI for mathematics is also growing, with new funding programs, university projects, workshops, fellowships, prizes, and infrastructure initiatives.

Key terms

A compact glossary of the most useful terms for browsing the AI-for-mathematics landscape covered in this index.

TermSimple description
AI for MathematicsThe use of AI systems to help with mathematical reasoning, proof, computation, discovery, or formalization.
FormalizationTranslating mathematics into a precise language that a proof assistant can check.
AutoformalizationUsing AI to automatically translate informal mathematical text into formal proof-assistant code.
Proof assistantSoftware that checks whether a mathematical proof is logically correct.
LeanA popular proof assistant widely used for modern formal mathematics and AI-for-math research.
mathlibLean’s large community-built library of formalized mathematics.
Formal theorem provingProving mathematical statements inside a proof assistant or another formal logic system.
Automated theorem provingUsing algorithms or AI systems to find proofs automatically.
Interactive theorem provingA proof process where humans and software work together step by step.
Neural theorem provingThe use of neural networks or LLMs to guide proof search or generate formal proofs.
LLMA large language model, such as ChatGPT, trained to generate and reason with text and code.
Agentic systemAn AI system that works in multiple steps, uses tools, checks errors, and revises its output.
Neuro-symbolic systemA system combining neural models, such as LLMs, with exact symbolic tools such as proof assistants or CAS.
Symbolic computationExact computation using algebraic rules, as in Wolfram Language, Mathematica, SageMath, or SymPy.
CASA computer algebra system that performs exact symbolic calculations.
Proof verificationChecking whether a proposed proof is correct according to formal rules.
Proof searchThe process of looking for a sequence of steps that proves a given statement.
Proof repairFixing a failed or incomplete formal proof using feedback from a proof assistant.
TacticA command or strategy used inside a proof assistant to make progress on a proof.
BenchmarkA test set used to measure how well AI systems perform on mathematical tasks.
DatasetA collection of examples used to train or evaluate AI systems.
miniF2FA widely used benchmark of formalized Olympiad-style mathematics problems.
FrontierMathA benchmark designed to test AI systems on difficult, research-style mathematical problems.
Mathematical discoveryFinding new conjectures, constructions, counterexamples, proofs, or mathematical patterns.
Human-AI workflowA research process where humans guide, inspect, and judge work done partly by AI systems.
Semantic reviewHuman checking of whether a formal proof or statement captures the intended informal mathematics.

1. News and commentary

Public reporting and commentary on AI-assisted mathematics, with 2025 items placed before 2026 items where relevant.

1.1 General reporting

The mathematicians teaching AI to reason

IBM ThinkOct 13, 2025AI reasoning

Summary: Explains why mathematics has become a testbed for AI reasoning, linking theorem proving, collaborative mathematical workflows, and the push toward verified AI-generated arguments.

Easy summary: Mathematics is becoming one of the places where researchers test whether AI can really reason, not just imitate text.

Artificial Mathematical Intelligence in 2025

Luis Scoccola blogDec 1, 2025field map

Summary: A technical but accessible blog post mapping “Artificial Mathematical Intelligence” into tasks such as informal theorem proving, formal theorem proving, autoformalization, informalization, counterexample search, and discovery. Its useful warning is the semantic gap between informal mathematical intention and formal or AI-produced output.

Easy summary: A helpful map of what “AI doing mathematics” can mean, and why matching meaning is difficult.

Learning the Mathematical Process

Computational Complexity blogDec 11, 2025math process

Summary: Lance Fortnow argues that IMO-style success is only one part of mathematics: future AI-for-math systems must learn the process of finding useful definitions, questions, and research directions.

Easy summary: Solving contest problems is impressive, but real mathematical research also requires learning how mathematicians choose paths.

How AI is Emerging as a Teammate for Mathematicians

Polytechnique InsightsJan 21, 2026commentary

Summary: Discusses the role of AI as a mathematical assistant, its current limits, and examples of AI-supported formalization such as Math Inc.’s Gauss work.

Easy summary: AI is becoming useful as a collaborator, but not yet a replacement for professional mathematicians.

The AI Revolution in Math Has Arrived

QuantaApr 13, 2026research-level AI math

Summary: A broad report on the rapid shift from contest-style AI mathematics to research-level mathematical assistance. It discusses the First Proof challenge, AI math startups, formal proof systems, and changing expectations among mathematicians.

Easy summary: Quanta says AI is no longer just solving school or Olympiad problems; it is starting to help with real research-style mathematics.

1.2 Models and benchmarks

DeepMind AI crushes tough maths problems on par with top human solvers

Nature NewsFeb 7, 2025AlphaGeometry2

Summary: Nature reported that Google DeepMind’s AlphaGeometry 2 reached the level of gold-medal students on International Mathematical Olympiad geometry problems, an early 2025 signal that specialized AI systems were becoming very strong at formal-style mathematical reasoning.

Easy summary: A geometry-focused AI system became strong enough to rival the best human Olympiad geometry solvers.

ByteDance BFS-Prover sets a MiniF2F record

ByteDance SeedFeb 25, 2025Lean theorem proving

Summary: ByteDance Seed announced BFS-Prover, a lightweight Lean-focused automated theorem prover using expert iteration, DPO from compiler feedback, and breadth-first search; the announcement reported a then-record MiniF2F score.

Easy summary: A comparatively lightweight theorem-proving model showed strong results by learning from Lean’s compiler feedback.

OpenAI’s o3 and the FrontierMath benchmark debate

TechCrunchApr 20, 2025benchmark transparency

Summary: Coverage of the gap between OpenAI’s earlier o3 FrontierMath headline number and independent Epoch AI results after launch. The episode became important because FrontierMath was meant to test advanced mathematical reasoning, and it raised broader questions about benchmark conditions, compute budgets, and transparency.

Easy summary: The story showed that AI math benchmark scores need careful interpretation, not just headline percentages.

DeepSeek-Prover-V2 released for Lean theorem proving

DeepSeekApr 30 / May 12, 2025open-source prover

Summary: DeepSeek released Prover-V2, an open-source model for formal theorem proving in Lean 4. Coverage and the associated paper emphasized recursive theorem proving, subgoal decomposition, and reinforcement learning from formal feedback.

Easy summary: DeepSeek pushed open-source AI theorem proving further by training a model specifically to write Lean proofs.

AlphaEvolve: a Gemini-powered coding agent for algorithm discovery

Google DeepMindMay 14, 2025algorithm discovery

Summary: Google DeepMind introduced AlphaEvolve, an evolutionary coding agent that uses Gemini models to propose and test code. The announcement emphasized algorithmic and mathematical discovery, including faster matrix multiplication algorithms and progress on optimization-style mathematical problems.

Easy summary: Instead of only writing proofs, AI was used to search for better algorithms and mathematical constructions.

At a secret math meeting, researchers struggled to outsmart AI

Scientific AmericanJun 6, 2025FrontierMath

Summary: A report on mathematicians designing very hard FrontierMath-style problems for AI systems. The article captured the growing anxiety and excitement around models that could solve problems faster than expected, while also showing how hard it is to build uncontaminated, meaningful evaluations.

Easy summary: Mathematicians were trying to write problems hard enough to expose where AI still fails.

Google and OpenAI models reach gold-medal level at IMO 2025

Nature / Reuters / DeepMindJul 21–24, 2025IMO gold level

Summary: Google DeepMind’s Gemini Deep Think and an experimental OpenAI model both reached gold-medal-level performance on the 2025 International Mathematical Olympiad, solving five of six problems. DeepMind emphasized that its system worked directly in natural language within the official time limit, unlike earlier formal-language systems that required translation and much longer computation.

Easy summary: 2025 was the first year AI systems reached top-student level on the IMO in a highly visible way.

ByteDance Seed Prover achieves IMO 2025 silver-medal score

ByteDance SeedJul 23, 2025formal IMO proof

Summary: By official invitation, ByteDance Seed reported that its Lean-based Seed Prover fully solved four of the six IMO 2025 problems and partially solved one, receiving an IMO-certified silver-medal-equivalent score.

Easy summary: Alongside Google and OpenAI, ByteDance showed that formal theorem-proving systems were also reaching very high Olympiad performance.

DeepSeek’s self-correcting AI model aces tough maths proofs

Nature NewsDec 4, 2025self-correction

Summary: Nature reported on a DeepSeek mathematical reasoning model that could identify and correct its own errors and achieved strong performance on prestigious mathematics competitions. The story is important because self-correction is a central requirement for reliable AI reasoning.

Easy summary: A model that can notice and repair its own mistakes is a step toward more dependable AI mathematics.

Seed Prover 1.5 extends Lean-based theorem proving

ByteDance SeedDec 24, 2025agentic theorem proving

Summary: ByteDance Seed announced Seed Prover 1.5, describing an agentic Lean theorem-proving architecture aimed at stronger undergraduate-level formal reasoning and more efficient interaction with proof environments.

Easy summary: A later Seed Prover version moved from single-shot proof generation toward a more agentic Lean workflow.

1.3 Formalization

A Lean companion to Analysis I

Terence Tao blogMay 31, 2025Lean education/formalization

Summary: Terence Tao launched a Lean companion to his textbook Analysis I, translating many definitions, theorems, and exercises into Lean and giving students an alternate formal route through the material.

Easy summary: A major mathematician turned a standard analysis text into a Lean-based learning and formalization project.

Proof assistants

Plus MagazineAug 14, 2025Lean / proof assistants

Summary: Plus Magazine published a proof-assistant collection connected to Kevin Buzzard and the 2025 Big Proof workshop. It introduced proof assistants, Lean, and the idea that AI could use such systems to check and produce mathematics more reliably.

Easy summary: A friendly introduction to why proof assistants matter in the AI-for-math story.

To Have Machines Make Math Proofs, Turn Them Into a Puzzle

QuantaNov 10, 2025SAT solving / computer proof

Summary: Quanta profiled Marijn Heule’s SAT-based approach to computer-assisted proof and discussed how SAT solving might combine with LLMs. The article is not only about LLMs, but it is important for the wider AI-for-math picture because it shows a complementary path: turn hard proof search into a machine-solvable combinatorial puzzle.

Easy summary: Some mathematical proofs can be found by translating the problem into a huge but checkable search puzzle.

Formalization of Erdős problems

Xena ProjectDec 5, 2025Erdős / Lean

Summary: A guest post on the Xena Project surveys the growing formalization activity around Erdős problems, including links between problem databases, OEIS, Lean statements, and AI-related proof efforts.

Easy summary: Erdős problems became a practical testing ground for connecting databases, formal statements, and AI-assisted proof work.

In Math, Rigor Is Vital. But Are Digitized Proofs Taking It Too Far?

QuantaMar 25, 2026Lean/formalization

Summary: A sociological and philosophical report on Lean and formal proof. It discusses the benefits of digitized proofs and concerns that formal systems may influence notation, definitions, research taste, or field priorities.

Easy summary: Formal proof systems can make proofs very reliable, but they may also change how mathematicians choose and express mathematics.

Math Long Resisted a Digital Disruption. AI Is Poised to Change That

Science NewsApr 15, 2026proof verification

Summary: Explains how AI-assisted formalization and proof assistants may change proof verification by forcing hidden assumptions and implicit steps into machine-checkable form.

Easy summary: AI and proof assistants may make checking mathematical proofs more like checking code.

A Watershed Moment for AI–Human Collaboration in Math

IEEE Spectrum2026sphere packing

Summary: Discusses AI-assisted proof verification, especially around high-profile formalization efforts such as sphere packing.

Easy summary: Human mathematicians and AI tools are beginning to share the work of making hard proofs checkable.

AI just verified a proof related to a Fields Medal-winning result

Live Science2026popular report

Summary: Popular coverage of AI-assisted verification involving sphere packing and formal proof technology.

Easy summary: A famous hard proof area is being checked with help from modern formal tools.

1.4 Discovery stories

Mathematical exploration and discovery at scale

Terence Tao blogNov 5, 2025AlphaEvolve in research math

Summary: Terence Tao discussed a paper with Bogdan Georgiev, Javier Gómez-Serrano, and Adam Zsolt Wagner on using AlphaEvolve across 67 mathematical problems in analysis, combinatorics, geometry, and related areas. The post is important because it describes both successes and limitations in a research-mathematics context.

Easy summary: AlphaEvolve was tested not just on toy tasks, but on a wide range of serious mathematical search problems.

AI is solving “impossible” math problems — can it best top mathematicians?

Live ScienceDec 19, 2025research-math outlook

Summary: A broad popular report on AI-assisted mathematical discovery, including dynamical-systems/Lyapunov-function work, DeepMind-related search, topology conjecture generation, and cautious expert reactions from mathematicians.

Easy summary: AI is beginning to suggest useful mathematical objects and conjectures, but experts still see human guidance as essential.

Using AI, Mathematicians Find Hidden Glitches in Fluid Equations

QuantaJan 9, 2026mathematical physics

Summary: Reports on AI-assisted exploration of candidate singularities/blowup behavior in simplified fluid equations, connected to the wider Navier–Stokes regularity landscape.

Easy summary: AI is being used like a searchlight to find unusual patterns in fluid equations that humans might miss.

Chinese AI solves 12-year-old math problem in 80 hours

TechRadarApr 2026Rethlas/Archon coverage

Summary: Popular coverage of the Rethlas/Archon commutative algebra result, emphasizing minimal human intervention and formal proof checking.

Easy summary: A news report about an AI system that reportedly found and checked a proof of an open algebra problem.

Liam Price / ChatGPT-assisted Erdős problem story

news/discussionApr 2026AI-assisted human work

Summary: Widely discussed story about a young mathematician/amateur using ChatGPT assistance on a long-standing Erdős-type problem. Use the eventual mathematical writeup for technical claims.

Easy summary: A human used AI as a helper in attacking a hard mathematical problem.

1.5 Startups

Harmonic announces Aristotle and formally verified IMO gold-level performance

Business WireJul 28, 2025formal verification startup

Summary: Harmonic announced Aristotle, describing it as a mathematical-superintelligence system built around formal verification, and claimed formally verified gold-medal-level performance on IMO 2025. Because this was a company announcement, the claims should be read alongside independent verification and public artifacts when available.

Easy summary: A startup framed formal verification as the path to reliable AI mathematical reasoning.

Math Inc. introduces Gauss and the strong Prime Number Theorem formalization

Math Inc.Sep 2025autoformalization

Summary: Math Inc. presented Gauss, an AI-assisted autoformalization agent, and a Lean formalization of the strong Prime Number Theorem with substantial complex-analysis infrastructure generated under human guidance.

Easy summary: An AI formalization agent helped turn a serious analytic number theory project into checkable Lean code at large scale.

Harmonic raises $120 million and is valued at $1.45 billion

ReutersNov 25, 2025AI math startup

Summary: Reuters reported that Harmonic raised $120 million at a $1.45 billion valuation. The company’s pitch centered on mathematical reasoning, Lean 4 formal verification, and reducing hallucinations by making AI output checkable reasoning code.

Easy summary: Investor interest in AI-for-math became large enough to create a high-valuation startup around formal reasoning.

A New AI Math Startup Just Cracked 4 Previously Unsolved Problems

Wired2026startup claim

Summary: Reports on Axiom/AxiomProver claims about solving previously unsolved problems using AI plus formal verification-style methods. This should be treated as news until matched against technical papers and proof artifacts.

Easy summary: A startup claims its AI solved new math problems, but the technical details need careful verification.

1.6 Public debate

Machine-Assisted Proof

Notices AMSJan 2025machine-assisted mathematics

Summary: Terence Tao’s Notices article surveys computer-assisted proof, interactive theorem provers, machine learning, generative AI, and the changing role of machines in mathematical research.

Easy summary: A leading mathematician gives a careful map of how computers, proof assistants, and AI are beginning to change proof work.

Mathematical Beauty, Truth and Proof in the Age of AI

QuantaApr 30, 2025philosophy of proof

Summary: Quanta explored how AI, proof assistants, and machine-generated patterns are changing mathematicians’ ideas about beauty, truth, understanding, and rigor. The article is useful because it captures the cultural shift around AI-for-math before the July IMO-gold moment.

Easy summary: It asks what happens to mathematical taste and understanding when machines help find or check proofs.

What Happens When AI Starts To Ask the Questions?

QuantaApr 30, 2025AI-generated questions

Summary: Quanta discussed the move from AI as a calculation tool toward AI as a system that can suggest questions, methods, and directions. Although broader than mathematics alone, it is relevant to AI-assisted discovery because problem selection is a major part of research.

Easy summary: AI may not only answer mathematical questions; it may also help decide which questions to ask.

Mathematics in the Context Window

commentaryAug 15, 2025research practice

Summary: Thomas Kahle reflects on how LLM context windows are changing day-to-day mathematical research, while warning that talk of a full revolution may be premature.

Easy summary: A grounded discussion of where AI is already useful in math research and where the hype still runs ahead of reality.

AI at IMO 2025: a round-up

Xena ProjectAug 3, 2025IMO commentary

Summary: Kevin Buzzard’s Xena Project gave a mathematically informed roundup of the 2025 IMO AI results, including the different roles of natural-language reasoning, formal proof, and official validation.

Easy summary: A proof-assistant expert explains what the IMO AI results do and do not show.

AI Took on the Math Olympiad—But Mathematicians Aren’t Impressed

Scientific AmericanAug 7, 2025critical commentary

Summary: Emily Riehl argued that the IMO-gold stories should not be confused with replacing mathematicians. The article emphasized differences between human competition conditions, benchmark-style AI testing, and the deeper standards mathematicians use for proof, understanding, and research contribution.

Easy summary: The AI results were impressive, but the article explains why mathematicians still want proof, context, and human judgment.

Is Math the Next AI Frontier? A Conversation with Terence Tao

Renaissance PhilanthropySep 26, 2025Tao interview

Summary: A public Q&A with Terence Tao on AI for formalization, mathematical discovery, trustworthy workflows, bottlenecks, and the role of philanthropy in building AI-for-math infrastructure.

Easy summary: Tao argues that AI could become very useful in mathematics, especially when tied to formal verification and better research workflows.

Formal or not formal? The question in AI theorem proving

Xena ProjectOct 22, 2025formal vs informal proof

Summary: Kevin Buzzard discusses the distinction between AI systems that use formal mathematics internally and systems whose outputs are formalized for external verification, while stressing the remaining library and review bottlenecks.

Easy summary: The key question is not just whether AI can find proofs, but whether they are produced or checked in a formal system humans can trust.

Mathematicians put AI model AlphaProof to the test

Nature News & ViewsNov 12, 2025AlphaProof

Summary: Nature’s News & Views coverage of AlphaProof discussed AI trained to use computational proof tools and why such systems might accelerate mathematical discovery. It connected the AlphaProof methodology to broader questions about proof assistants, formal verification, and AI search.

Easy summary: AlphaProof is important because it combines AI search with machine-checkable mathematical proof.

2. Grants and programs

Funding calls, funded teams, workshops, fellowships, prizes, challenges, and infrastructure initiatives.

2.1 Funding programs

DARPA expMath: Exponentiating Mathematics

DARPA2025 call / 2026 active programAI co-author

Summary: A DARPA program to accelerate pure mathematics by building AI systems that can propose useful abstractions, decompose hard problems, and prove mathematical claims with formal or semi-formal methods.

Easy summary: DARPA is funding teams to build AI systems that can help mathematicians discover and prove new results.

AI for Math Fund

Renaissance Philanthropy / XTX2025 inaugural grants / 2026 roundgrants

Summary: Philanthropic funding program for AI/ML projects that accelerate mathematics. In September 2025, Renaissance Philanthropy and XTX Markets announced $18 million in inaugural grants to 29 project teams, covering formalized frontier-math datasets, autoformalization, educational proof tools, and AI-assisted discovery infrastructure.

Easy summary: A major private fund supports projects that use AI to help mathematics progress faster.

Google DeepMind and Google.org AI for Math Initiative

Google DeepMind / Google.orgOct 29, 2025research initiative

Summary: Google announced an AI for Math Initiative with partner institutions including Imperial College London, the Institute for Advanced Study, IHES, the Simons Institute, and TIFR. The initiative combines funding, institutional partnerships, and access to tools such as Gemini Deep Think, AlphaEvolve, and AlphaProof.

Easy summary: Google moved from individual AI-math demos toward a formal institutional program with leading mathematics centers.

2.2 Funded teams/projects

MIT affiliates win AI for Math Fund grants

MIT NewsSep 22, 2025grant recipients

Summary: MIT reported that mathematics researchers David Roe and Andrew Sutherland were among the inaugural AI for Math Fund recipients, along with several MIT alumni. Their project connects the L-Functions and Modular Forms Database with Lean’s mathlib, showing how AI-for-math funding can support concrete bridges between mathematical databases and formal libraries.

Easy summary: One funded direction is to connect existing mathematical databases with Lean so AI systems can use and verify richer knowledge.

UMD–MIT GENIUS / DARPA expMath

UMD / MITMar 2026DARPA

Summary: Public university news describes a DARPA-funded effort led by UMD with MIT collaboration to accelerate mathematical discovery using AI.

Easy summary: Another expMath team is working on AI systems for mathematical discovery.

UCLA ALPHA / DARPA expMath

UCLAApr 2026$5M DARPA contract

Summary: UCLA announced a three-year DARPA contract for ALPHA, aimed at AI tools for mathematical discovery, formal proof synthesis, and verification.

Easy summary: UCLA is one of the teams funded to build AI systems for serious mathematical proof work.

NYU expMath participation

NYUApr 2026DARPA team

Summary: NYU announced participation in DARPA’s expMath inaugural cohort, focused on building AI collaborators for mathematics.

Easy summary: NYU is also part of the DARPA-funded AI-for-math effort.

EPFL AI for Math

EPFL2026Lean/document formalization

Summary: Project focused on open-source tools for formalizing mathematical documents using Lean, supported by AI-for-math funding.

Easy summary: EPFL is building tools to turn mathematical documents into checkable Lean code.

2.3 Workshops/conferences

Informal Workshop on Lean Prover and Math-AI at IISc

IISc MathematicsApr 21, 2025Lean / Math-AI workshop

Summary: The IISc Mathematics department hosted an informal workshop on Lean Prover and Math-AI, with demos, tutorials, and working sessions on Lean, mathlib, and AI-for-mathematics topics.

Easy summary: The AI-for-math ecosystem expanded through hands-on Lean and Math-AI workshops beyond the usual US/Europe centers.

2nd AI for Math Workshop @ ICML 2025

WorkshopJul 18, 2025AI for math

Summary: The second AI for Math workshop at ICML 2025 brought together researchers working on autoformalization, theorem proving, benchmarks, mathematical reasoning, and AI-assisted discovery.

Easy summary: AI-for-math had become established enough to have a recurring workshop at a major machine-learning conference.

AI × Mathematics 2026 residency

ICMS2026residency

Summary: Interdisciplinary residency on conjecture generation, autoformalization, automated theorem proving, and neuro-symbolic systems.

Easy summary: A focused event where researchers work on how AI can help real mathematics.

3rd AI for Math Workshop @ ICML 2026

WorkshopICML 2026AI for math

Summary: Workshop on formal theorem proving, mathematical reasoning, real-world mathematical research, and autoformalization.

Easy summary: The machine-learning community is hosting a dedicated workshop on AI for mathematics.

SIMIS AI for Mathematics Workshop

SIMISApr 2026workshop

Summary: Workshop covering theorem proving, conjecture formulation, language processing, formalization, and mathematical discovery.

Easy summary: Another major event bringing together AI and mathematics researchers.

AI for Maths and Open Science

Newton Institute2026workshop

Summary: Workshop on how AI can assist mathematicians and improve open science practices around mathematics.

Easy summary: A workshop on making AI-assisted mathematics more open and useful to the research community.

2.4 Fellowships/training

UW Winter 2026 Math AI Lab Projects

University of WashingtonWinter 2026student/research projects

Summary: Math AI lab projects involving Lean-specific AI assistants, autoformalization, math search, datasets, and mathlib contributions.

Easy summary: Students and researchers are being organized into practical AI-for-math projects.

AI for Math Summer Fellowship 2026

Fellowship2026training pipeline

Summary: Fellowship placing students and early-career researchers into AI-for-math projects involving theorem proving, proof assistants, datasets, and infrastructure.

Easy summary: A funded pipeline for young researchers to work on AI and mathematics.

2.5 Prizes/challenges

NVIDIA team wins AIMO progress-prize competition with a reasoning model

NVIDIAApr 15, 2025AIMO / Kaggle

Summary: NVIDIA reported that its NemoSkills team won the latest AI Mathematical Olympiad progress-prize competition on Kaggle, showing continued progress on open mathematical-reasoning competitions.

Easy summary: Open math-AI competitions were no longer just benchmarks; major teams were actively competing and improving models through them.

AIMO Progress Prize 3 launched

AIMO Prize / KaggleNov 2025open math competition

Summary: The third AI Mathematical Olympiad progress prize launched on Kaggle with a larger prize pot and harder problems, continuing the open competition track toward IMO-level AI systems.

Easy summary: The AIMO prize series kept pushing public, competitive progress in mathematical reasoning models.

AMS Milestone Prize for Foundational Work in Formal Mathematics

AMSformal mathematicsprize

Summary: Recognition infrastructure for major achievements in formal mathematics, reestablished with support from the AI For Math Fund.

Easy summary: Formalized mathematics is becoming important enough to get dedicated prize recognition.

2.6 Infrastructure initiatives

Creating a database of motivated proofs

Gowers blogSep 22, 2025AI training data

Summary: Timothy Gowers proposes collecting proofs that expose the motivations and choices behind arguments, arguing that AI for mathematics may need better kinds of mathematical data, not merely more data.

Easy summary: For AI to learn mathematics well, it may need to see why proofs are found, not only the polished final proof.

3. Literature

Papers, preprints, benchmarks, formalization reports, and workflow papers.

3.1 Surveys/overviews

Autoformalization in the Era of Large Language Models: A Survey

surveyMay/Jul 2025autoformalization

Summary: Surveys LLM-era autoformalization from mathematical and machine-learning perspectives, covering workflows, data, model design, evaluation, applications, and open challenges.

Easy summary: A map of how AI systems are being trained to translate ordinary mathematics into formal, checkable language.

Mathematics: the Rise of the Machines

overviewNov 2025AI for mathematics

Summary: Broad overview of how AI can assist mathematics through theorem proving, conjecture formulation, language processing, and pattern discovery across mathematical sciences.

Easy summary: A high-level survey of the ways AI is beginning to help mathematicians discover, prove, and organize mathematics.

On the mechanical creation of mathematical concepts

conceptualJul 2025discovery

Summary: Conceptual paper arguing that explicit concept creation is a central step in mathematical discovery and analyzing why current AI systems mostly form concepts implicitly.

Easy summary: It asks whether AI can invent new mathematical concepts, not only solve problems stated in existing language.

AI for Mathematics: Progress, Challenges, and Prospects

surveyJan/Feb 2026

Summary: Broad survey of AI-for-math, covering mathematical reasoning, autoformalization, formal theorem proving, benchmarks, and LLM-based systems.

Easy summary: A map of the entire AI-for-math field and its main challenges.

Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification

CACM2026

Summary: Discusses how LLMs and formal reasoning systems can combine to improve mathematical reasoning and verification.

Easy summary: LLMs are useful, but they need formal systems to check correctness.

Towards Autonomous Mathematics Research

roadmapFeb 2026

Summary: Proposes levels of autonomy for AI mathematical research and discusses emerging systems and milestones.

Easy summary: A paper asking how close AI is to doing parts of mathematical research on its own.

A Survey on Mathematical Reasoning and Optimization with Large Language Models

surveyMar 2025LLM math reasoning

Summary: Survey of LLM-based mathematical reasoning, theorem proving, optimization, symbolic/tool-augmented methods, and their limits.

Easy summary: A broad 2025 survey of how LLMs are being used for mathematical reasoning and optimization.

Advancing mathematics research with generative AI

overviewSep/Nov 2025research workflows

Summary: Discusses how generative AI can assist advanced mathematics through pattern finding, code generation, conjecturing, examples, CAS integration, and formal proof assistants.

Easy summary: A mathematician-facing overview of how generative AI may become a useful research assistant.

Mathematics with large language models as provers and verifiers

case studyOct 2025Lean verification

Summary: Reports a protocol using LLM prover and verifier instances, with final Lean checking and human semantic review, on IMO and number-theory tasks.

Easy summary: A case-study style paper on using LLMs as both proof generators and proof checkers.

3.2 Autoformalization

Mathesis: Towards Formal Theorem Proving from Natural Languages

LeanJun 2025end-to-end

Summary: Introduces an end-to-end pipeline that autoformalizes natural-language problems, scores formalization quality, and then generates Lean proofs; also introduces the Gaokao-Formal benchmark.

Easy summary: The system starts from ordinary problem statements and tries to reach verified Lean proofs.

KELPS: Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment

Lean/Coq/IsabelleJul 2025multi-language

Summary: Proposes a neuro-symbolic autoformalization framework using an intermediate Knowledge Equation language to synthesize data for Lean, Coq, and Isabelle.

Easy summary: It tries to make autoformalization work across several proof assistants, not only Lean.

CriticLean: Critic-Guided Reinforcement Learning for Lean 4 Formalizations

LeanJul 2025semantic fidelity

Summary: Trains critic models to judge whether Lean formalizations preserve the intended meaning, introduces CriticLeanBench, and builds the FineLeanCorpus dataset.

Easy summary: It teaches a model to detect when a Lean translation compiles but means the wrong thing.

FormaRL: Enhancing Autoformalization with No Labeled Data

RLAug 2025autoformalization

Summary: Uses reinforcement learning with unlabeled data to improve autoformalization, addressing scarcity of paired informal/formal training examples.

Easy summary: It tries to improve theorem-to-Lean translation without needing many hand-labeled examples.

Aria: Retrieval and Iterative Auto-Formalization via Graph-of-Thought Reasoning

LeanOct 2025retrieval

Summary: Presents an agent for research-level Lean statement formalization using dependency decomposition, retrieval, iterative repair, and term-level grounding checks.

Easy summary: It breaks a theorem into dependency pieces, looks up the right Lean concepts, and repairs the formalization.

SITA: A Framework for Structure-to-Instance Theorem Autoformalization

LeanNov 2025research-level

Summary: Develops a method for formalizing concrete instances of abstract mathematical structures in Lean using reusable templates, typeclass mechanisms, and LLM-assisted refinement.

Easy summary: It helps transfer abstract Lean theories to concrete mathematical examples.

Towards a Common Framework for Autoformalization

AAAI 2026autoformalization framework

Summary: Reviews different forms of autoformalization and proposes a unified framework to connect approaches across fields.

Easy summary: A paper trying to give everyone a common language for autoformalization.

M2F: Automated Formalization of Mathematical Literature at Scale

LeanFeb 2026document-scale

Summary: Agentic framework for converting long mathematical sources into a compiling Lean project, reporting 153,853 lines from 479 pages of real/convex analysis textbooks.

Easy summary: A system that tries to turn entire math textbooks into Lean code automatically.

MerLean: An Agentic Framework for Autoformalization in Quantum Computation

LeanFeb 2026quantum computation

Summary: Extracts statements from LaTeX quantum-computation papers, formalizes them into Lean 4, and translates them back to LaTeX for semantic review.

Easy summary: A tool that reads quantum-computing papers and tries to make their math checkable in Lean.

Automatic Textbook Formalization

LeanMar/Apr 2026textbook-scale

Summary: Case study of automatically formalizing a 500+ page graduate algebraic combinatorics textbook into Lean.

Easy summary: An AI system attempts to convert a whole advanced textbook into a proof-assistant project.

Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees

LeanApr 2026PRIME benchmark

Summary: Uses structured decomposition and operator trees to improve translation of mathematical statements into Lean, and introduces the PRIME benchmark.

Easy summary: Instead of translating a theorem as one block of text, the system breaks it into logical pieces first.

SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

ML trainingApr 2026Lean

Summary: Studies how overlap between supervised fine-tuning data and GRPO prompts affects Lean 4 autoformalization performance.

Easy summary: A training-method paper about making models better at translating math into Lean.

Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning

LeanMar 2026training

Summary: Uses natural language → Lean → natural language consistency as a training signal for better autoformalization.

Easy summary: The model checks whether translating to Lean and back preserves the meaning.

ProofBridge: Auto-Formalization of Natural Language Proofs in Lean

ICLR 2026Leanproof-level

Summary: Framework for translating natural-language theorems and proofs into Lean 4 using joint embeddings and retrieval.

Easy summary: It tries to turn not only theorem statements, but also human proof explanations, into Lean.

Automated Formalization via Conceptual Retrieval

updated 2026retrieval

Summary: Uses retrieval of relevant formal concepts, definitions, and theorem names to improve translation from informal mathematics into formal systems.

Easy summary: Autoformalization works better when the AI can look up the right existing formal concepts.

Surface Sensitivity in Lean 4 Autoformalization

LeanApr 2026robustness

Summary: Studies how small changes in the natural-language wording of a theorem affect Lean formalization outcomes.

Easy summary: The same theorem worded slightly differently may confuse an autoformalization system.

Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization

autoformalizationFeb 2025data quality

Summary: Studies back-translation and proof-state-informed prompting, arguing that high-quality synthetic data can outperform much larger multilingual datasets for autoformalization.

Easy summary: For autoformalization, carefully made data may matter more than simply having lots of data.

FMC: Formalization of Natural Language Mathematical Competition Problems

autoformalizationJul 2025Lean dataset

Summary: Introduces a training-free LLM pipeline with error feedback for formalizing competition problems into Lean and curates a paired natural-language/Lean dataset.

Easy summary: A dataset and pipeline for turning Olympiad-style problems into Lean statements.

Autoformalizer with Tool Feedback

autoformalizationOct 2025Lean feedback

Summary: Introduces ATF, which uses Lean compiler feedback and multi-LLM semantic checks to refine natural-language-to-Lean formalization.

Easy summary: The formalizer learns to repair its Lean translations using tool feedback.

FIRMA: Bidirectional Formal-Informal Mathematical Language Alignment with Proof-Theoretic Grounding

MathNLP 2025formal/informal alignment

Summary: Proposes a bidirectional formal-informal alignment system intended to preserve proof-theoretic structure when moving between mathematical text and formal language.

Easy summary: A paper on aligning ordinary mathematical language with formal proof language.

3.3 Theorem proving

BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

LeanFeb 2025tree search

Summary: Studies best-first search for Lean theorem proving with expert iteration, compiler-error feedback, preference optimization, and length-normalized exploration.

Easy summary: A simpler tree-search strategy becomes highly competitive when scaled carefully.

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

DeepMindFeb 2025geometry

Summary: Presents AlphaGeometry2, extending the AlphaGeometry language and symbolic engine to solve much harder Olympiad geometry problems at gold-medalist level.

Easy summary: DeepMind’s geometry system became strong enough to match top Olympiad geometry solvers.

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

LeanFeb 2025open-source prover

Summary: Introduces an open-source Lean prover trained from large-scale formalized statements and iteratively generated formal proofs, improving miniF2F and PutnamBench performance.

Easy summary: An open prover learns by generating more Lean proofs and training on the successful ones.

Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

self-playFeb 2025theorem proving

Summary: Designs a Self-play Theorem Prover where a conjecturer and prover co-train, generating new conjectures/exercises and proofs as mutual training signals.

Easy summary: The prover improves by inventing problems for itself and trying to solve them.

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

LeanMar 2025multi-agent

Summary: Combines multi-agent long-chain reasoning with Lean verification feedback, balancing informal high-level reasoning and formal proof generation.

Easy summary: Several agents cooperate: some reason informally, while Lean checks whether the formal proof works.

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

LeanApr 2025DeepSeek

Summary: Introduces DeepSeek-Prover-V2, using recursive subgoal decomposition and reinforcement learning to combine informal reasoning with Lean proof generation.

Easy summary: The model decomposes hard problems into smaller Lean-verifiable subproblems.

APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning

LeanMay 2025proof repair

Summary: A modular pipeline in which LLMs generate Lean proofs, compiler feedback locates errors, subgoals are isolated and repaired, and successful fragments are recombined.

Easy summary: APOLLO makes Lean proof generation more efficient by repairing failed proof pieces instead of restarting from scratch.

DeepTheorem: Advancing LLM Reasoning for Theorem Proving

informal proofsMay 2025benchmark/data

Summary: Develops an informal theorem-proving framework with a large IMO-level theorem/proof dataset and reinforcement-learning methods for natural-language proof reasoning.

Easy summary: It improves LLM theorem proving in ordinary mathematical language rather than directly in Lean.

ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction

neuro-symbolicMay 2025self-correction

Summary: Combines LLMs with proof verification, symbolic proof-tree supervision, verifier-guided reinforcement learning, and iterative self-correction.

Easy summary: The model writes proof attempts, checks them formally, and learns how to correct its mistakes.

An Agent-Based Framework for Formal Mathematical Proofs

LeanJun 2025Prover Agent

Summary: Introduces Prover Agent, coordinating informal reasoning, formal proof generation, auxiliary lemma construction, and feedback from Lean.

Easy summary: The agent uses both natural-language reasoning and Lean feedback to build proofs.

DREAM: Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

FOLJun 2025Lean 4 dataset

Summary: Uses first-order-logic theorem proving, strategy diversification, and sub-proposition error feedback to improve LLM mathematical reasoning.

Easy summary: It tests and improves LLM reasoning on multi-step logical theorem-proving tasks.

Delta Prover: Solving Formal Math Problems by Decomposition and Iterative Repair

LeanJul 2025agentic prover

Summary: An agentic framework that uses reflective decomposition, iterative proof repair, and a Lean-based DSL for subproblem management without needing a specialized prover model.

Easy summary: A general LLM is guided to break Lean proofs into smaller tasks and repair them step by step.

Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

LeanJul 2025RL

Summary: Post-trains Lean prover models using verifier feedback, multi-turn interactions, feedback masking, and reinforcement learning over reasoning trajectories.

Easy summary: The Lean verifier becomes a training signal that teaches the model to correct itself.

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

LeanJul 2025IMO 2025

Summary: Presents Seed-Prover and Seed-Geometry, combining Lean feedback, lemma-style reasoning, self-summarization, and test-time strategies for IMO-level formal theorem proving.

Easy summary: A ByteDance system used Lean-centered reasoning to prove most formalized IMO-style problems.

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

LeanAug 2025open-source prover

Summary: Improves Goedel-Prover through scaffolded synthetic tasks, Lean-verifier-guided self-correction, and model averaging, setting strong open-source prover results.

Easy summary: The second Goedel-Prover version uses synthetic curricula and Lean feedback to become much stronger.

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

LeanSep 2025recursive decomposition

Summary: Combines an informal reasoner, a specialized Lean prover, a verifier, and semantic theorem retrieval to recursively decompose and solve proof goals.

Easy summary: Hilbert lets natural-language reasoning guide the construction of checkable Lean proofs.

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

RLOct 2025co-evolution

Summary: Uses co-evolution of problem generation and solving under verifiable environments to improve Lean theorem-proving models such as Goedel-Prover-V2 and DeepSeek-Prover-V2.

Easy summary: One part generates harder proof tasks while another learns to solve them.

Ax-Prover: A Deep Reasoning Agentic Theorem Prover

LeanOct 2025multi-agent

Summary: Presents a multi-agent Lean theorem-proving system designed for autonomous or human-collaborative formal proof generation across scientific domains.

Easy summary: A multi-agent system attempts to solve scientific and mathematical problems by producing Lean proofs.

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

LeanDec 2025agentic RL

Summary: Uses large-scale agentic reinforcement learning and efficient test-time scaling to improve undergraduate and graduate-level formal theorem proving.

Easy summary: The model accumulates experience from Lean feedback and becomes stronger at proving hard formal problems.

Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4

LeanApr 2026Hard Mode

Summary: Introduces “Hard Mode,” where the system must discover the answer before proving it formally, and releases hard-mode benchmarks.

Easy summary: The AI cannot just prove a known answer; it must first find the answer.

OptProver

LeanApr 2026optimization

Summary: Develops a Lean theorem-proving benchmark and prover for undergraduate optimization theory.

Easy summary: This pushes AI theorem proving into optimization problems.

Numina-Lean-Agent

LeanJan 2026tool-use agent

Summary: Open general tool-use agent for formal theorem proving, treating Lean as an interactive environment similar to coding.

Easy summary: An AI agent that works inside Lean by trying, checking, and fixing proofs.

A Minimal Agent for Automated Theorem Proving

agent baselineFeb 2026

Summary: Presents a simple baseline architecture for theorem-proving agents using iterative refinement, context management, and library search.

Easy summary: A stripped-down AI theorem prover used to understand what parts really matter.

TheoremForge

formal data synthesisJan 2026

Summary: Low-budget agentic pipeline for formal data synthesis, proof generation, premise selection, correction, and proof sketching.

Easy summary: A pipeline to generate and repair formal theorem-proving data.

Neural Theorem Proving for Verification Conditions / NTP4VC

Lean/Isabelle/RocqJan 2026verification

Summary: Benchmark for proving verification conditions across several formal languages, based partly on real-world software-verification tasks.

Easy summary: Tests AI proof systems on real formal verification tasks, not just math puzzles.

On Reasoning-Centric LLM-based Automated Theorem Proving

ATPApr 2026

Summary: Technical paper on reasoning-centered LLM approaches for automated theorem proving, including integration with Lean-oriented tools.

Easy summary: A paper about making LLMs reason more effectively when proving theorems.

Quantum automated theorem proving

Jan 2026logicspeculative

Summary: Proposes quantum-resolution approaches to propositional and first-order automated theorem proving.

Easy summary: A speculative attempt to combine quantum computation ideas with theorem proving.

Olympiad-level formal mathematical reasoning with AlphaProof

Nature2025Lean/RL

Summary: Presents AlphaProof, a reinforcement-learning-based formal reasoning system working in Lean, and reports strong performance on olympiad-level problems.

Easy summary: DeepMind’s AlphaProof paper shows formal proof search reaching high-level olympiad performance.

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

LeanApr 2025RL prover

Summary: Introduces Kimina-Prover, a Lean 4 formal theorem prover trained with large-scale reinforcement learning and structured formal reasoning patterns.

Easy summary: A major 2025 open formal-reasoning model for Lean theorem proving.

Lean-STaR: Learning to Interleave Thinking and Proving

ICLR 2025Leanthought + tactics

Summary: Trains models to generate informal thoughts before Lean proof steps, combining retrospective rationale generation with expert iteration.

Easy summary: The model learns to think informally while constructing formal Lean proofs.

LeanAgent: Lifelong Learning for Formal Theorem Proving

ICLR 2025Leanlifelong learning

Summary: Proposes a lifelong-learning theorem-proving framework that continually generalizes across expanding Lean repositories while reducing forgetting.

Easy summary: A Lean prover that keeps learning across many projects instead of staying fixed on one dataset.

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

LeanApr 2025post-training

Summary: Studies continual training and reinforcement learning with Lean compiler rewards to improve whole-proof generation for formal theorem proving.

Easy summary: A post-training approach for making Lean provers stronger using verifier feedback.

ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Isabelle/LeanJan 2025proof augmentation

Summary: Uses fine-grained proof-structure analysis and proof augmentation to improve neural theorem proving, with results in Isabelle and a Lean 4 version.

Easy summary: A method for getting more value from proof structure when training theorem provers.

REAL-Prover: Retrieval Augmented Lean Theorem Proving

LeanMay 2025retrieval

Summary: Introduces a retrieval-augmented stepwise Lean 4 prover aimed at more advanced, college-level mathematics.

Easy summary: A Lean prover that uses retrieval to find relevant facts while proving.

HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement

IsabelleMay 2025hybrid prover

Summary: Combines whole-proof synthesis and tactic-based refinement in Isabelle, showing improved performance on miniF2F.

Easy summary: A theorem prover that mixes full proof sketches with step-by-step tactic repair.

FANS: Formal Answer Selection for Natural Language Math Reasoning Using Lean4

LeanMar 2025answer verification

Summary: Uses Lean 4 formalization and proving to help select correct answers among LLM-generated responses to natural-language math questions.

Easy summary: Lean is used as a judge to choose the correct answer from possible LLM outputs.

LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4

LeanJul 2025white-box search

Summary: Introduces Lean tooling and a dataset for factorizing proof states into simpler branches, supporting more efficient white-box proof search.

Easy summary: A way to split Lean proof states so search can work on smaller pieces.

StepFun-Prover Preview: Let’s Think and Verify Step by Step

LeanJul 2025tool-integrated RL

Summary: Develops a tool-integrated formal prover that interleaves reasoning, Lean snippets, REPL feedback, and reinforcement learning from proof outcomes.

Easy summary: A Lean prover that learns to use feedback from the proof environment step by step.

Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability

LeanMay 2025hybrid reasoning

Summary: Combines natural-language reasoning with Lean-formalized existence problems so that LLM math answers can benefit from formal proof attempts.

Easy summary: A hybrid method where natural-language math solving is helped by Lean-formal reasoning.

Modeling Tactics as Operators: Effect-Grounded Representations for Lean Theorem Proving

MathNLP 2025Lean tactics

Summary: Represents Lean tactics by the changes they induce on proof states, producing more effect-aware tactic embeddings.

Easy summary: Instead of treating tactics as words, it studies what each tactic does to a proof state.

Synthetic Proofs with Tool-Integrated Reasoning: Contrastive Alignment for LLM Mathematics with Lean

MathNLP 2025Leansynthetic proofs

Summary: Uses Lean-based synthetic proof tasks, tool-integrated reasoning, and contrastive alignment to improve LLM mathematical proving behavior.

Easy summary: Lean-generated proof tasks are used to train models toward better mathematical reasoning.

3.4 Benchmarks/datasets

A Combinatorial Identities Benchmark for Theorem Proving in Lean

LeanCombFeb 2025combinatorics

Summary: Constructs LeanComb, a benchmark for combinatorial identities, and uses an automated theorem generator to create a large enhanced dataset of formal proofs.

Easy summary: A benchmark and data generator focused on combinatorial identities in Lean.

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

Lean4May 2025large benchmark

Summary: Introduces a 5,560-problem Lean4 benchmark spanning Olympiad to undergraduate mathematics, built with human-in-the-loop autoformalization and semantic checks.

Easy summary: A large Lean benchmark for testing how well AI systems can prove formal mathematics.

Ineq-Comp: Benchmarking Compositional Reasoning in Automated Theorem Proving on Inequalities

inequalitiesMay 2025benchmark

Summary: Tests whether formal provers understand compositional inequality reasoning by transforming elementary inequalities into multi-step compositional variants.

Easy summary: It checks whether AI provers can combine familiar inequalities in the way humans naturally do.

FormalStep: Step-Level Formal Verification for Mathematical Reasoning

Lean 4Jun 2025step verification

Summary: Part of the Safe framework, FormalStep provides 30,809 formal statements for checking the correctness of individual reasoning steps in generated solutions.

Easy summary: Instead of only checking the final answer, it checks whether each mathematical step is valid.

Lean Meets Theoretical Computer Science

Lean4Aug 2025formal-informal pairs

Summary: Uses theoretical computer science domains such as Busy Beaver and mixed Boolean arithmetic to generate verified formal/informal theorem-proving challenge pairs.

Easy summary: It creates many hard but checkable Lean proof tasks from theoretical computer science.

FormalML: A Benchmark for Formal Subgoal Completion in Machine Learning Theory

Lean4Sep/Oct 2025subgoal completion

Summary: Introduces a benchmark of nearly 5,000 subgoal-completion problems extracted from formalized optimization and probability inequalities in machine-learning theory.

Easy summary: It tests whether AI can fill missing proof steps inside serious ML-theory formalizations.

RLMEval: Evaluating Research-Level Neural Theorem Proving

LeanOct 2025research-level benchmark

Summary: Builds an evaluation suite from real Lean Blueprint formalization projects to test theorem proving and proof autoformalization at research level.

Easy summary: A benchmark drawn from real formalization projects, not just contest-style examples.

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

formal algebraNov 2025research-level benchmark

Summary: Introduces FATE-H and FATE-X, formal algebra benchmarks ranging from undergraduate to beyond PhD qualifying-exam difficulty, exposing large gaps in current provers.

Easy summary: A hard algebra benchmark showing that research-level formal theorem proving is still difficult for AI.

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Lean4Nov/Dec 2025human-verified

Summary: Introduces a human-verified Lean 4 benchmark of Indian Mathematics Olympiad problems curated through AI-assisted formalization and expert validation.

Easy summary: A carefully checked benchmark for testing both autoformalization and Lean theorem proving.

MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification

DafnyDec 2025benchmark

Summary: Translates miniF2F to Dafny and evaluates LLM-generated proof hints, highlighting a division of labor between LLM guidance and automated verification.

Easy summary: It brings a standard math benchmark into Dafny and tests how much automation plus LLM hints can solve.

FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?

benchmarkMar 2026graduate-level

Summary: Benchmark testing whether models can produce Lean 4 proofs for graduate-level mathematics.

Easy summary: A test of whether AI can write proof-assistant proofs for advanced math problems.

Can AI Provers Complete Real-World Lean Theorems? / SorryDB

LeanMar 2026real-world benchmark

Summary: Creates a benchmark from unfinished `sorry`s across real Lean projects, testing whether AI can complete actual formalization gaps.

Easy summary: Instead of artificial tests, it asks AI to fill real missing proofs in Lean projects.

ArXivLean

benchmarkApr 2026research-level

Summary: Benchmark built from recent arXiv papers, translating research-level statements into Lean tasks to reduce contamination and measure current capabilities.

Easy summary: A test where AI tries to prove Lean versions of fresh research-paper statements.

AMBER: A Benchmark for Applied Mathematics in Lean 4

LeanFeb 2026applied math

Summary: Benchmark for constructive applied mathematics tasks requiring algorithms, representation transformations, or explicit computations in Lean.

Easy summary: A benchmark that tests applied-math reasoning, not just pure theorem statements.

LeanCat: A Benchmark Suite for Formal Category Theory

Leancategory theory2026 version

Summary: Benchmark suite targeting formal category theory, a domain with distinctive abstractions and proof patterns.

Easy summary: A Lean benchmark for category-theory proofs.

k-server-bench: Automating Potential Discovery for the k-Server Conjecture

Apr 2026benchmarkdiscovery

Summary: Benchmark in which agents search for potential functions satisfying linear-inequality systems related to the k-server conjecture.

Easy summary: AI is tested on finding the right hidden mathematical object, not just proving a given formula.

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Mar 2026benchmarkautomatic verification

Summary: Benchmark designed to evaluate progress toward discovery-like mathematics with automatically verifiable outcomes.

Easy summary: A test for whether AI is moving toward real mathematical discovery.

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

benchmarkMay 2025Lean combinatorics

Summary: Introduces 100 Lean 4 combinatorics problems across a range of difficulty levels, including IMO-style combinatorics.

Easy summary: A Lean benchmark focused specifically on combinatorial mathematics.

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

benchmarkJun 2025multimodal theorem proving

Summary: Builds a multimodal, multi-level, multi-language theorem-proving benchmark with formalizations in Lean 4, Coq, and Isabelle.

Easy summary: A benchmark for theorem proving when problems include diagrams or visual information.

miniF2F-Lean Revisited: Reviewing Limitations and Misalignments

benchmark analysisNov 2025miniF2F

Summary: Analyzes mismatches between informal and formal statements in miniF2F-Lean and explains why end-to-end performance can be lower than isolated autoformalization/proving scores.

Easy summary: A careful audit showing that benchmark statements themselves can be a major source of difficulty.

Reliable Evaluation and Benchmarks for Statement Autoformalization

benchmark/evaluationEMNLP 2025autoformalization

Summary: Introduces improved metrics and benchmark resources for evaluating statement autoformalization into Lean, including ProofNetVerif.

Easy summary: A benchmark-and-metrics paper for measuring whether autoformalization is actually faithful.

LeanGeo: Formalizing Competition Geometry Problems in Lean

LeanAug 2025geometry benchmark

Summary: Introduces a Lean 4 geometry formalization framework and LeanGeo-Bench for IMO and advanced geometry problems.

Easy summary: A Lean benchmark and library for formalizing geometry problems.

MSC-180: A Benchmark for Automated Formal Theorem Proving from Mathematical Subject Classification

benchmarkDec 2025research breadth

Summary: Creates 180 expert-refined formal verification problems across 60 MSC2020 mathematical branches to test domain breadth of LLM provers.

Easy summary: A benchmark that tests whether provers can handle many branches of mathematics, not just familiar contest areas.

VeriBench-FTP: A Formal Theorem Proving Benchmark in Lean 4 for Code Verification

benchmark2025Lean/code verification

Summary: A Lean 4 benchmark evaluating theorem-proving capabilities on code-verification style problems, complementing pure-math benchmarks.

Easy summary: A benchmark for Lean theorem proving in verified-code settings.

3.5 Discovery

AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery

DeepMindJun 2025evolutionary search

Summary: White paper describing AlphaEvolve, an LLM-guided evolutionary coding agent for improving algorithms, scientific computations, and mathematical constructions through automated evaluation.

Easy summary: An AI system proposes code changes, tests them, and evolves better algorithms or constructions.

LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving

Lean4Jun 2025conjecture generation

Summary: Generates university-level mathematical conjectures in Lean using LLMs, rule-based context extraction, and methods aimed at reducing data scarcity in theorem proving.

Easy summary: The system tries to invent useful Lean conjectures that can become training or proof targets.

Lemmanaid: Neuro-Symbolic Lemma Conjecturing

Isabelle/HOLApr 2025lemma discovery

Summary: Uses a fine-tuned LLM to suggest lemma templates and symbolic methods to instantiate them, improving lemma conjecturing in Isabelle/HOL and AFP settings.

Easy summary: It helps discover useful intermediate lemmas, which are often the missing steps in proofs.

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

LeanMay/Oct 2025neuro-symbolic

Summary: Combines LLM enumeration, pattern-driven conjecturing, and Lean formal proving to solve answer-construction problems; introduces ConstructiveBench.

Easy summary: The system searches for the right mathematical object and then formally proves that it works.

Discovering New Theorems via LLMs with In-Context Proof Examples

Lean4Sep 2025new theorem discovery

Summary: Proposes a conjecturing–proving loop in which LLMs generate new theorem statements and attempt to prove them in Lean 4.

Easy summary: It asks whether an AI can move from proving known theorems to proposing new ones.

Learning Interestingness in Automated Mathematical Discovery

RLNov 2025theory discovery

Summary: Introduces FERMAT, a reinforcement-learning environment for open-ended concept discovery and theorem proving, with an emphasis on learning what makes results interesting.

Easy summary: It tries to teach AI not only to prove things, but to find results worth caring about.

Automated Conjecture Resolution with Formal Verification

Apr 2026Rethlas + ArchonLean

Summary: Two-agent system combining informal reasoning/search with formal Lean verification, claiming to resolve an open commutative algebra problem.

Easy summary: One AI searches for the proof idea; another checks it formally.

Resolution of Erdős Problem #728: a writeup of Aristotle’s autonomous resolution

Jan 2026Erdős problemLean

Summary: Presents a writeup of an AI-assisted/autonomous resolution of an Erdős problem, with formalization in Lean.

Easy summary: A named open problem was reportedly solved with an AI system and checked in Lean.

Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design

Mar 2026combinatorial designLean/symbolic solvers

Summary: Human-AI-symbolic collaboration that produced a new result on imbalance of Latin squares, using symbolic solvers and Lean verification.

Easy summary: Humans, AI, and exact tools worked together to find and verify a new combinatorics result.

Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery

Apr 2026Ramsey theorycomputer-assisted

Summary: Case study in combinatorial discovery using computation/AI-assisted conjecture generation and proof support.

Easy summary: Computers help discover and check new facts about Ramsey graphs.

FlowBoost Reveals Phase Transitions and Spectral Structure in Finite Free Information Inequalities

Apr 2026inequalitiesexperimental discovery

Summary: Uses computational/AI-style exploration to study extremal inequality structures and phase transitions.

Easy summary: AI-like computation helps explore complicated inequality landscapes.

Discovering Mathematical Concepts Through a Multi-Agent System

Mar 2026multi-agentconcept discovery

Summary: Multi-agent framework for mathematical concept and conjecture discovery, constraining generation by provability.

Easy summary: Multiple AI agents interact to suggest mathematical ideas that can actually be proved.

Yanasse: Finding New Proofs from Deep Vision’s Analogies

Apr 2026analogyproof discovery

Summary: Attempts proof discovery by transferring proof-strategy patterns across mathematical areas using analogy-like mechanisms.

Easy summary: It tries to find new proofs by recognizing patterns similar to old proofs.

Generating Hadamard Matrices with Transformers

Apr 2026combinatorial construction

Summary: Uses transformer models to generate combinatorial designs related to Hadamard matrices.

Easy summary: AI is used to search for special matrices with difficult combinatorial properties.

Using Reasoning Models to Generate Search Heuristics for Combinatorial Design

discoveryMay 2025combinatorial design

Summary: Uses reasoning LLMs to generate and refine search heuristics for combinatorial design problems, including open instances.

Easy summary: LLMs write search strategies that help solve difficult combinatorial design cases.

A Matter of Interest: Understanding Interestingness of Math Problems in Automated Mathematical Discovery

discoveryNov 2025interestingness

Summary: Studies how to characterize or learn “interestingness” in mathematical problems, a key issue for automated mathematical discovery.

Easy summary: A discovery system must not only find true statements, but also recognize which ones are interesting.

3.6 Formalized mathematics

A Formal Proof of the Irrationality of ζ(3) in Lean 4

LeanFeb/Aug 2025number theory

Summary: Formalizes Beukers’ proof of the irrationality of zeta(3) in Lean 4, adding supporting analytic number theory and shifted Legendre polynomial infrastructure.

Easy summary: A famous irrationality theorem is made machine-checkable in Lean.

On the Averaging Problem of Ideal Families Related to Frankl's Conjecture with Formal Proof by Lean 4

LeanApr 2025combinatorics

Summary: Proves an average-rarity result for ideal set families related to Frankl’s union-closed sets conjecture and verifies the proof in Lean 4.

Easy summary: A combinatorics result related to Frankl’s conjecture is checked in Lean.

Formalizing a classification theorem for low-dimensional solvable Lie algebras in Lean

LeanMay 2025Lie algebras

Summary: Formalizes the classification of solvable Lie algebras of dimension at most three over arbitrary fields, expanding mathlib’s Lie theory and linear algebra infrastructure.

Easy summary: Lean is used to verify a concrete classification theorem from Lie algebra theory.

A Formalization of Divided Powers in Lean

LeanJul 2025commutative algebra

Summary: Formalizes the basic theory of divided powers in Lean 4, including morphisms, sub-divided power ideals, quotients, sums, and supporting power-series infrastructure.

Easy summary: A foundational construction used in algebraic geometry and number theory is formalized in Lean.

Formalizing dimensional analysis using the Lean theorem prover

LeanSep 2025formalized science

Summary: Formalizes dimensions, units, dimensional homogeneity, SI units, constants, the Buckingham Pi theorem, and examples such as the Lennard-Jones potential.

Easy summary: Lean is used to check that scientific equations are dimensionally meaningful.

Formalization of De Giorgi–Nash–Moser Theory in Lean

LeanApr 2026PDE

Summary: Formalizes major PDE regularity results, including local boundedness, weak Harnack inequality, Moser Harnack inequality, and interior Hölder regularity.

Easy summary: A major piece of modern PDE theory is translated into machine-checkable Lean.

A Milestone in Formalization: The Sphere Packing Problem in Dimension 8

LeanApr 2026sphere packing

Summary: Reports a milestone in formalizing Viazovska’s dimension-8 sphere-packing result in Lean, with AI assistance in final stages.

Easy summary: A famous Fields Medal-level result is being made machine-checkable.

A formal proof of the Ramanujan–Nagell theorem in Lean 4

LeanApr 2026number theory

Summary: Formalizes the classical theorem on integer solutions of x² + 7 = 2ⁿ, requiring substantial algebraic number theory infrastructure.

Easy summary: A famous number-theory theorem is fully checked by Lean.

A formalization of the Gelfond–Schneider theorem

LeanMar 2026transcendental number theory

Summary: Formalizes Hilbert’s seventh problem / the Gelfond–Schneider theorem in Lean 4.

Easy summary: Lean checks a deep theorem about transcendental numbers.

A Prime-Generated Formalization of Nagata’s Factoriality Theorem in Lean 4

LeanApr 2026commutative algebra

Summary: Formalizes Nagata’s factoriality theorem and UFD applications in Lean/mathlib with no placeholder axioms.

Easy summary: A serious commutative algebra theorem is now checkable in Lean.

Munkres’ General Topology Autoformalized in Isabelle/HOL

Isabelle/HOLApr 2026topology

Summary: LLM-assisted formalization of Munkres’ general topology, reporting over 85,000 lines, 806 formal results, and zero sorries.

Easy summary: A standard topology textbook is turned into machine-checkable Isabelle/HOL mathematics.

A formal proof of the Sands–Sauer–Woodrow theorem using Rocq and MathComp/SSReflect

Rocq/CoqApr 2026graph theory

Summary: Formalizes a theorem in infinite directed graph theory using Rocq and MathComp/SSReflect.

Easy summary: A graph-theory theorem is checked in the Rocq proof assistant.

Synthetic Differential Geometry in Lean

LeanMar 2026geometry

Summary: Formalizes synthetic differential geometry in Lean/mathlib, including Taylor-style reasoning around infinitesimal neighborhoods.

Easy summary: Lean is used for a nonstandard style of differential geometry.

Formalization of QFT

LeanMar 2026mathematical physics

Summary: Formalizes a foundational result in constructive quantum field theory in Lean 4, with discussion of AI-assisted formalization.

Easy summary: Formal proof assistants are reaching into mathematical physics.

Semi-Autonomous Formalization of the Vlasov–Maxwell–Landau Equilibrium

LeanMar 2026PDE/plasma physics

Summary: Uses AI reasoning and agentic coding with Lean verification under human supervision for a mathematical physics formalization.

Easy summary: AI helps formalize a difficult equation system from plasma physics.

Statistical Learning Theory in Lean 4: Empirical Processes from Scratch

LeanFeb 2026statistical learning theory

Summary: Formalizes empirical-process foundations relevant to statistical learning theory using a human-AI collaborative workflow.

Easy summary: Lean is used to check mathematical foundations of machine learning theory.

Formalizing Gröbner Basis Theory in Lean

LeanFeb 2026computational algebra

Summary: Formalizes polynomial division, Buchberger’s criterion, reduced Gröbner bases, and related algebraic infrastructure.

Easy summary: Lean gets formal versions of central tools from computational algebra.

Formalizing the Classical Isoperimetric Inequality in the Two-Dimensional Case

LeanMar 2026analysis/geometry

Summary: Formalizes Hurwitz’s analytic proof of the planar isoperimetric inequality in Lean 4.

Easy summary: Lean checks the classical theorem that the circle encloses maximal area for fixed perimeter.

A Blueprint for the Formalization of Seymour’s Matroid Decomposition Theorem

Lean blueprintJan 2026matroid theory

Summary: Lays out a dependency blueprint for formalizing Seymour’s matroid decomposition theorem in Lean.

Easy summary: A plan for formalizing a major theorem in matroid theory.

ZFLean: a framework for set-level mathematics in Lean

LeanApr 2026ZFC infrastructure

Summary: Provides infrastructure for doing ZFC-style set-level mathematics inside Lean 4 while preserving some mathlib ergonomics.

Easy summary: A project to make set-theoretic mathematics more natural in Lean.

Bipartite Exact Matching in P

LeanApr 2026algorithmic/combinatorial

Summary: Algorithmic/combinatorial work accompanied by Lean formalization of key proof skeleton and supporting infrastructure.

Easy summary: A new algorithmic proof is backed by machine-checked Lean components.

Hamilton decompositions of the directed 3-torus

LeanMar 2026combinatorics

Summary: Combinatorics paper accompanied by a Lean 4 formalization of the construction/proof components.

Easy summary: A finite combinatorial construction is checked with Lean.

The Chase in Lean — Crafting a Formal Library for Existential Rule Research

LeanApr 2026logic/databases

Summary: Builds a Lean framework for existential rules and the chase procedure, formalizing known results and termination conditions.

Easy summary: Lean is used to verify logical tools important in databases and knowledge representation.

On the Formalization of Network Topology Matrices in HOL

Isabelle/HOLMar 2026network matrices

Summary: Formalizes graph/network matrices such as adjacency, degree, Laplacian, and incidence matrices in HOL.

Easy summary: A formal proof system is used to verify matrix concepts from network theory.

Tutte’s theorem as an educational formalization project

LeanApr 2025graph theory

Summary: Formalizes Tutte’s theorem in Lean and discusses an educational framework for formalization projects.

Easy summary: A key graph-theory theorem is formalized in Lean and used as a learning model.

A Formalization of the Generalized Quantum Stein’s Lemma in Lean

LeanOct 2025quantum information

Summary: Formalizes the Generalized Quantum Stein’s Lemma in Lean, contributing to foundations for quantum information formalization.

Easy summary: A difficult quantum information theorem is made machine-checkable in Lean.

A Modular Lean 4 Framework for Confluence and Strong Normalization

LeanDec 2025metatheory

Summary: Develops a Lean 4 library for confluence and strong normalization of lambda calculi, including several mechanized case studies.

Easy summary: Lean is used to build reusable infrastructure for formalizing programming-language metatheory.

3.7 Symbolic integration

Saturation-Driven Dataset Generation for LLM Mathematical Reasoning in the TPTP Ecosystem

TPTP/E-proverSep 2025symbolic data

Summary: Uses E-prover saturation over TPTP axioms to generate guaranteed-valid theorem data for entailment, premise selection, and proof reconstruction tasks.

Easy summary: Classic automated theorem proving is used as a reliable data engine for training and testing LLM reasoning.

Structuring Definitions in Mathematical Libraries

Lean/mathlibSep/Nov 2025library design

Summary: A survey-style discussion of how to choose usable formal definitions in proof-assistant libraries, with comparisons to computer algebra library design.

Easy summary: It explains why picking the right formal definition is often the hardest part of building a math library.

Automated Tactics for Polynomial Reasoning in Lean 4

LeanApr 2026CAS certificates

Summary: Connects Lean with external computational algebra tools such as SageMath/SymPy via checkable certificates for polynomial reasoning.

Easy summary: Let a CAS do algebra, but make Lean check the result.

Lean Finder: Semantic Search for Mathlib That Understands User Intents

LeanOct 2025semantic search

Summary: Builds a semantic search engine for mathlib that aligns with mathematicians’ query intent and can support LLM-based theorem provers.

Easy summary: A better search tool for finding the right Lean theorem or definition.

Tree-Based Premise Selection for Lean4

LeanNeurIPS 2025 workshoppremise selection

Summary: Uses expression-tree structure and similarity methods for training-free premise selection in Lean 4.

Easy summary: A method for choosing useful Lean lemmas based on theorem structure.

LeanDojo-v2: A Comprehensive Library for AI-Assisted Theorem Proving in Lean

LeanMATH-AI 2025tooling

Summary: Presents an updated library integrating machine learning workflows with Lean theorem proving.

Easy summary: Infrastructure for connecting Lean proof work with AI tooling.

3.8 Human-AI workflows

Safe: Enhancing Mathematical Reasoning in LLMs through Step-Aware Formal Verification

Lean 4Jun 2025verification workflow

Summary: Uses Lean 4 to formalize and verify individual claims in generated mathematical reasoning, aiming to detect hallucinated or invalid proof steps.

Easy summary: It checks an AI’s mathematical solution step by step rather than trusting the final answer.

Advancing Mathematical Research via Human-AI Theorem Proving and Discovery

human-in-the-loopDec 2025scientific computing

Summary: Proposes a human-in-the-loop workflow where LLM outputs serve as raw material for interactive theorem proving, discovery, and rigorous scientific-computing workflows.

Easy summary: It frames AI as a research assistant whose suggestions still need human judgment and formal checking.

A Practical Guide to AI-Assisted Research in Mathematics

guideMar 2026

Summary: Practical advice and examples for mathematicians using AI tools responsibly in research workflows.

Easy summary: A user guide for mathematicians who want to use AI without being misled by it.

A correspondence problem for mathematical proof

conceptualMar 2026

Summary: Analyzes the problem of whether a formal proof actually corresponds to the intended informal proof and theorem.

Easy summary: Even if a proof assistant checks something, we must ensure it checked the right thing.

Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis

LeanApr 2026tool analysis

Summary: Studies which tools help Lean formalization agents: expert drafts, symbol/definition search, compiler feedback, and combinations thereof.

Easy summary: A careful test of what actually helps AI write Lean code.

Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization

LeanApr 2026semantic review

Summary: Introduces Lean Atlas and Lean Compass to help humans inspect dependency graphs and semantically review AI-generated formalizations.

Easy summary: It helps humans check whether AI-generated Lean code means what it is supposed to mean.

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

LeanJun 2025education/workflow

Summary: Combines LLM natural-language interaction with Lean verification to prototype a proof tutor that can give mathematically checkable feedback.

Easy summary: A proof tutor that tries to make AI explanations formally checkable.

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean

Lean2025 versioncopilot

Summary: Framework for running LLM inference from inside Lean, enabling proof automation tools that fit directly into Lean users’ workflow.

Easy summary: An LLM copilot interface built into the Lean theorem-proving workflow.

Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving

verification workflowMay 2025SymPy/Z3

Summary: Introduces MATH-VF, a training-free step-by-step verification framework using a formal language, LLM formalization, and external tools such as SymPy and Z3.

Easy summary: A workflow for checking individual steps in an LLM’s math solution.