Papers, preprints, benchmarks, formalization reports, and workflow papers.
3.1 Surveys/overviews
Autoformalization in the Era of Large Language Models: A Survey
surveyMay/Jul 2025autoformalization
Summary: Surveys LLM-era autoformalization from mathematical and machine-learning perspectives, covering workflows, data, model design, evaluation, applications, and open challenges.
Easy summary: A map of how AI systems are being trained to translate ordinary mathematics into formal, checkable language.
Mathematics: the Rise of the Machines
overviewNov 2025AI for mathematics
Summary: Broad overview of how AI can assist mathematics through theorem proving, conjecture formulation, language processing, and pattern discovery across mathematical sciences.
Easy summary: A high-level survey of the ways AI is beginning to help mathematicians discover, prove, and organize mathematics.
On the mechanical creation of mathematical concepts
conceptualJul 2025discovery
Summary: Conceptual paper arguing that explicit concept creation is a central step in mathematical discovery and analyzing why current AI systems mostly form concepts implicitly.
Easy summary: It asks whether AI can invent new mathematical concepts, not only solve problems stated in existing language.
AI for Mathematics: Progress, Challenges, and Prospects
surveyJan/Feb 2026
Summary: Broad survey of AI-for-math, covering mathematical reasoning, autoformalization, formal theorem proving, benchmarks, and LLM-based systems.
Easy summary: A map of the entire AI-for-math field and its main challenges.
Formal Reasoning Meets LLMs: Toward AI for Mathematics and Verification
CACM2026
Summary: Discusses how LLMs and formal reasoning systems can combine to improve mathematical reasoning and verification.
Easy summary: LLMs are useful, but they need formal systems to check correctness.
Towards Autonomous Mathematics Research
roadmapFeb 2026
Summary: Proposes levels of autonomy for AI mathematical research and discusses emerging systems and milestones.
Easy summary: A paper asking how close AI is to doing parts of mathematical research on its own.
A Survey on Mathematical Reasoning and Optimization with Large Language Models
surveyMar 2025LLM math reasoning
Summary: Survey of LLM-based mathematical reasoning, theorem proving, optimization, symbolic/tool-augmented methods, and their limits.
Easy summary: A broad 2025 survey of how LLMs are being used for mathematical reasoning and optimization.
Advancing mathematics research with generative AI
overviewSep/Nov 2025research workflows
Summary: Discusses how generative AI can assist advanced mathematics through pattern finding, code generation, conjecturing, examples, CAS integration, and formal proof assistants.
Easy summary: A mathematician-facing overview of how generative AI may become a useful research assistant.
Mathematics with large language models as provers and verifiers
case studyOct 2025Lean verification
Summary: Reports a protocol using LLM prover and verifier instances, with final Lean checking and human semantic review, on IMO and number-theory tasks.
Easy summary: A case-study style paper on using LLMs as both proof generators and proof checkers.
3.2 Autoformalization
Mathesis: Towards Formal Theorem Proving from Natural Languages
LeanJun 2025end-to-end
Summary: Introduces an end-to-end pipeline that autoformalizes natural-language problems, scores formalization quality, and then generates Lean proofs; also introduces the Gaokao-Formal benchmark.
Easy summary: The system starts from ordinary problem statements and tries to reach verified Lean proofs.
KELPS: Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment
Lean/Coq/IsabelleJul 2025multi-language
Summary: Proposes a neuro-symbolic autoformalization framework using an intermediate Knowledge Equation language to synthesize data for Lean, Coq, and Isabelle.
Easy summary: It tries to make autoformalization work across several proof assistants, not only Lean.
CriticLean: Critic-Guided Reinforcement Learning for Lean 4 Formalizations
LeanJul 2025semantic fidelity
Summary: Trains critic models to judge whether Lean formalizations preserve the intended meaning, introduces CriticLeanBench, and builds the FineLeanCorpus dataset.
Easy summary: It teaches a model to detect when a Lean translation compiles but means the wrong thing.
FormaRL: Enhancing Autoformalization with No Labeled Data
RLAug 2025autoformalization
Summary: Uses reinforcement learning with unlabeled data to improve autoformalization, addressing scarcity of paired informal/formal training examples.
Easy summary: It tries to improve theorem-to-Lean translation without needing many hand-labeled examples.
Aria: Retrieval and Iterative Auto-Formalization via Graph-of-Thought Reasoning
LeanOct 2025retrieval
Summary: Presents an agent for research-level Lean statement formalization using dependency decomposition, retrieval, iterative repair, and term-level grounding checks.
Easy summary: It breaks a theorem into dependency pieces, looks up the right Lean concepts, and repairs the formalization.
SITA: A Framework for Structure-to-Instance Theorem Autoformalization
LeanNov 2025research-level
Summary: Develops a method for formalizing concrete instances of abstract mathematical structures in Lean using reusable templates, typeclass mechanisms, and LLM-assisted refinement.
Easy summary: It helps transfer abstract Lean theories to concrete mathematical examples.
Towards a Common Framework for Autoformalization
AAAI 2026autoformalization framework
Summary: Reviews different forms of autoformalization and proposes a unified framework to connect approaches across fields.
Easy summary: A paper trying to give everyone a common language for autoformalization.
M2F: Automated Formalization of Mathematical Literature at Scale
LeanFeb 2026document-scale
Summary: Agentic framework for converting long mathematical sources into a compiling Lean project, reporting 153,853 lines from 479 pages of real/convex analysis textbooks.
Easy summary: A system that tries to turn entire math textbooks into Lean code automatically.
MerLean: An Agentic Framework for Autoformalization in Quantum Computation
LeanFeb 2026quantum computation
Summary: Extracts statements from LaTeX quantum-computation papers, formalizes them into Lean 4, and translates them back to LaTeX for semantic review.
Easy summary: A tool that reads quantum-computing papers and tries to make their math checkable in Lean.
Automatic Textbook Formalization
LeanMar/Apr 2026textbook-scale
Summary: Case study of automatically formalizing a 500+ page graduate algebraic combinatorics textbook into Lean.
Easy summary: An AI system attempts to convert a whole advanced textbook into a proof-assistant project.
Decompose, Structure, and Repair: A Neuro-Symbolic Framework for Autoformalization via Operator Trees
LeanApr 2026PRIME benchmark
Summary: Uses structured decomposition and operator trees to improve translation of mathematical statements into Lean, and introduces the PRIME benchmark.
Easy summary: Instead of translating a theorem as one block of text, the system breaks it into logical pieces first.
SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
ML trainingApr 2026Lean
Summary: Studies how overlap between supervised fine-tuning data and GRPO prompts affects Lean 4 autoformalization performance.
Easy summary: A training-method paper about making models better at translating math into Lean.
Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning
LeanMar 2026training
Summary: Uses natural language → Lean → natural language consistency as a training signal for better autoformalization.
Easy summary: The model checks whether translating to Lean and back preserves the meaning.
ProofBridge: Auto-Formalization of Natural Language Proofs in Lean
ICLR 2026Leanproof-level
Summary: Framework for translating natural-language theorems and proofs into Lean 4 using joint embeddings and retrieval.
Easy summary: It tries to turn not only theorem statements, but also human proof explanations, into Lean.
Automated Formalization via Conceptual Retrieval
updated 2026retrieval
Summary: Uses retrieval of relevant formal concepts, definitions, and theorem names to improve translation from informal mathematics into formal systems.
Easy summary: Autoformalization works better when the AI can look up the right existing formal concepts.
Surface Sensitivity in Lean 4 Autoformalization
LeanApr 2026robustness
Summary: Studies how small changes in the natural-language wording of a theorem affect Lean formalization outcomes.
Easy summary: The same theorem worded slightly differently may confuse an autoformalization system.
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
autoformalizationFeb 2025data quality
Summary: Studies back-translation and proof-state-informed prompting, arguing that high-quality synthetic data can outperform much larger multilingual datasets for autoformalization.
Easy summary: For autoformalization, carefully made data may matter more than simply having lots of data.
FMC: Formalization of Natural Language Mathematical Competition Problems
autoformalizationJul 2025Lean dataset
Summary: Introduces a training-free LLM pipeline with error feedback for formalizing competition problems into Lean and curates a paired natural-language/Lean dataset.
Easy summary: A dataset and pipeline for turning Olympiad-style problems into Lean statements.
Autoformalizer with Tool Feedback
autoformalizationOct 2025Lean feedback
Summary: Introduces ATF, which uses Lean compiler feedback and multi-LLM semantic checks to refine natural-language-to-Lean formalization.
Easy summary: The formalizer learns to repair its Lean translations using tool feedback.
FIRMA: Bidirectional Formal-Informal Mathematical Language Alignment with Proof-Theoretic Grounding
MathNLP 2025formal/informal alignment
Summary: Proposes a bidirectional formal-informal alignment system intended to preserve proof-theoretic structure when moving between mathematical text and formal language.
Easy summary: A paper on aligning ordinary mathematical language with formal proof language.
3.3 Theorem proving
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving
LeanFeb 2025tree search
Summary: Studies best-first search for Lean theorem proving with expert iteration, compiler-error feedback, preference optimization, and length-normalized exploration.
Easy summary: A simpler tree-search strategy becomes highly competitive when scaled carefully.
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
DeepMindFeb 2025geometry
Summary: Presents AlphaGeometry2, extending the AlphaGeometry language and symbolic engine to solve much harder Olympiad geometry problems at gold-medalist level.
Easy summary: DeepMind’s geometry system became strong enough to match top Olympiad geometry solvers.
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving
LeanFeb 2025open-source prover
Summary: Introduces an open-source Lean prover trained from large-scale formalized statements and iteratively generated formal proofs, improving miniF2F and PutnamBench performance.
Easy summary: An open prover learns by generating more Lean proofs and training on the successful ones.
Self-play LLM Theorem Provers with Iterative Conjecturing and Proving
self-playFeb 2025theorem proving
Summary: Designs a Self-play Theorem Prover where a conjecturer and prover co-train, generating new conjectures/exercises and proofs as mutual training signals.
Easy summary: The prover improves by inventing problems for itself and trying to solve them.
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
LeanMar 2025multi-agent
Summary: Combines multi-agent long-chain reasoning with Lean verification feedback, balancing informal high-level reasoning and formal proof generation.
Easy summary: Several agents cooperate: some reason informally, while Lean checks whether the formal proof works.
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition
LeanApr 2025DeepSeek
Summary: Introduces DeepSeek-Prover-V2, using recursive subgoal decomposition and reinforcement learning to combine informal reasoning with Lean proof generation.
Easy summary: The model decomposes hard problems into smaller Lean-verifiable subproblems.
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
LeanMay 2025proof repair
Summary: A modular pipeline in which LLMs generate Lean proofs, compiler feedback locates errors, subgoals are isolated and repaired, and successful fragments are recombined.
Easy summary: APOLLO makes Lean proof generation more efficient by repairing failed proof pieces instead of restarting from scratch.
DeepTheorem: Advancing LLM Reasoning for Theorem Proving
informal proofsMay 2025benchmark/data
Summary: Develops an informal theorem-proving framework with a large IMO-level theorem/proof dataset and reinforcement-learning methods for natural-language proof reasoning.
Easy summary: It improves LLM theorem proving in ordinary mathematical language rather than directly in Lean.
ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction
neuro-symbolicMay 2025self-correction
Summary: Combines LLMs with proof verification, symbolic proof-tree supervision, verifier-guided reinforcement learning, and iterative self-correction.
Easy summary: The model writes proof attempts, checks them formally, and learns how to correct its mistakes.
An Agent-Based Framework for Formal Mathematical Proofs
LeanJun 2025Prover Agent
Summary: Introduces Prover Agent, coordinating informal reasoning, formal proof generation, auxiliary lemma construction, and feedback from Lean.
Easy summary: The agent uses both natural-language reasoning and Lean feedback to build proofs.
DREAM: Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
FOLJun 2025Lean 4 dataset
Summary: Uses first-order-logic theorem proving, strategy diversification, and sub-proposition error feedback to improve LLM mathematical reasoning.
Easy summary: It tests and improves LLM reasoning on multi-step logical theorem-proving tasks.
Delta Prover: Solving Formal Math Problems by Decomposition and Iterative Repair
LeanJul 2025agentic prover
Summary: An agentic framework that uses reflective decomposition, iterative proof repair, and a Lean-based DSL for subproblem management without needing a specialized prover model.
Easy summary: A general LLM is guided to break Lean proofs into smaller tasks and repair them step by step.
Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning
LeanJul 2025RL
Summary: Post-trains Lean prover models using verifier feedback, multi-turn interactions, feedback masking, and reinforcement learning over reasoning trajectories.
Easy summary: The Lean verifier becomes a training signal that teaches the model to correct itself.
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
LeanJul 2025IMO 2025
Summary: Presents Seed-Prover and Seed-Geometry, combining Lean feedback, lemma-style reasoning, self-summarization, and test-time strategies for IMO-level formal theorem proving.
Easy summary: A ByteDance system used Lean-centered reasoning to prove most formalized IMO-style problems.
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
LeanAug 2025open-source prover
Summary: Improves Goedel-Prover through scaffolded synthetic tasks, Lean-verifier-guided self-correction, and model averaging, setting strong open-source prover results.
Easy summary: The second Goedel-Prover version uses synthetic curricula and Lean feedback to become much stronger.
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
LeanSep 2025recursive decomposition
Summary: Combines an informal reasoner, a specialized Lean prover, a verifier, and semantic theorem retrieval to recursively decompose and solve proof goals.
Easy summary: Hilbert lets natural-language reasoning guide the construction of checkable Lean proofs.
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
RLOct 2025co-evolution
Summary: Uses co-evolution of problem generation and solving under verifiable environments to improve Lean theorem-proving models such as Goedel-Prover-V2 and DeepSeek-Prover-V2.
Easy summary: One part generates harder proof tasks while another learns to solve them.
Ax-Prover: A Deep Reasoning Agentic Theorem Prover
LeanOct 2025multi-agent
Summary: Presents a multi-agent Lean theorem-proving system designed for autonomous or human-collaborative formal proof generation across scientific domains.
Easy summary: A multi-agent system attempts to solve scientific and mathematical problems by producing Lean proofs.
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
LeanDec 2025agentic RL
Summary: Uses large-scale agentic reinforcement learning and efficient test-time scaling to improve undergraduate and graduate-level formal theorem proving.
Easy summary: The model accumulates experience from Lean feedback and becomes stronger at proving hard formal problems.
Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
LeanApr 2026Hard Mode
Summary: Introduces “Hard Mode,” where the system must discover the answer before proving it formally, and releases hard-mode benchmarks.
Easy summary: The AI cannot just prove a known answer; it must first find the answer.
OptProver
LeanApr 2026optimization
Summary: Develops a Lean theorem-proving benchmark and prover for undergraduate optimization theory.
Easy summary: This pushes AI theorem proving into optimization problems.
Numina-Lean-Agent
LeanJan 2026tool-use agent
Summary: Open general tool-use agent for formal theorem proving, treating Lean as an interactive environment similar to coding.
Easy summary: An AI agent that works inside Lean by trying, checking, and fixing proofs.
A Minimal Agent for Automated Theorem Proving
agent baselineFeb 2026
Summary: Presents a simple baseline architecture for theorem-proving agents using iterative refinement, context management, and library search.
Easy summary: A stripped-down AI theorem prover used to understand what parts really matter.
TheoremForge
formal data synthesisJan 2026
Summary: Low-budget agentic pipeline for formal data synthesis, proof generation, premise selection, correction, and proof sketching.
Easy summary: A pipeline to generate and repair formal theorem-proving data.
Neural Theorem Proving for Verification Conditions / NTP4VC
Lean/Isabelle/RocqJan 2026verification
Summary: Benchmark for proving verification conditions across several formal languages, based partly on real-world software-verification tasks.
Easy summary: Tests AI proof systems on real formal verification tasks, not just math puzzles.
On Reasoning-Centric LLM-based Automated Theorem Proving
ATPApr 2026
Summary: Technical paper on reasoning-centered LLM approaches for automated theorem proving, including integration with Lean-oriented tools.
Easy summary: A paper about making LLMs reason more effectively when proving theorems.
Quantum automated theorem proving
Jan 2026logicspeculative
Summary: Proposes quantum-resolution approaches to propositional and first-order automated theorem proving.
Easy summary: A speculative attempt to combine quantum computation ideas with theorem proving.
Olympiad-level formal mathematical reasoning with AlphaProof
Nature2025Lean/RL
Summary: Presents AlphaProof, a reinforcement-learning-based formal reasoning system working in Lean, and reports strong performance on olympiad-level problems.
Easy summary: DeepMind’s AlphaProof paper shows formal proof search reaching high-level olympiad performance.
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
LeanApr 2025RL prover
Summary: Introduces Kimina-Prover, a Lean 4 formal theorem prover trained with large-scale reinforcement learning and structured formal reasoning patterns.
Easy summary: A major 2025 open formal-reasoning model for Lean theorem proving.
Lean-STaR: Learning to Interleave Thinking and Proving
ICLR 2025Leanthought + tactics
Summary: Trains models to generate informal thoughts before Lean proof steps, combining retrospective rationale generation with expert iteration.
Easy summary: The model learns to think informally while constructing formal Lean proofs.
LeanAgent: Lifelong Learning for Formal Theorem Proving
ICLR 2025Leanlifelong learning
Summary: Proposes a lifelong-learning theorem-proving framework that continually generalizes across expanding Lean repositories while reducing forgetting.
Easy summary: A Lean prover that keeps learning across many projects instead of staying fixed on one dataset.
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
LeanApr 2025post-training
Summary: Studies continual training and reinforcement learning with Lean compiler rewards to improve whole-proof generation for formal theorem proving.
Easy summary: A post-training approach for making Lean provers stronger using verifier feedback.
ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis
Isabelle/LeanJan 2025proof augmentation
Summary: Uses fine-grained proof-structure analysis and proof augmentation to improve neural theorem proving, with results in Isabelle and a Lean 4 version.
Easy summary: A method for getting more value from proof structure when training theorem provers.
REAL-Prover: Retrieval Augmented Lean Theorem Proving
LeanMay 2025retrieval
Summary: Introduces a retrieval-augmented stepwise Lean 4 prover aimed at more advanced, college-level mathematics.
Easy summary: A Lean prover that uses retrieval to find relevant facts while proving.
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement
IsabelleMay 2025hybrid prover
Summary: Combines whole-proof synthesis and tactic-based refinement in Isabelle, showing improved performance on miniF2F.
Easy summary: A theorem prover that mixes full proof sketches with step-by-step tactic repair.
FANS: Formal Answer Selection for Natural Language Math Reasoning Using Lean4
LeanMar 2025answer verification
Summary: Uses Lean 4 formalization and proving to help select correct answers among LLM-generated responses to natural-language math questions.
Easy summary: Lean is used as a judge to choose the correct answer from possible LLM outputs.
LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4
LeanJul 2025white-box search
Summary: Introduces Lean tooling and a dataset for factorizing proof states into simpler branches, supporting more efficient white-box proof search.
Easy summary: A way to split Lean proof states so search can work on smaller pieces.
StepFun-Prover Preview: Let’s Think and Verify Step by Step
LeanJul 2025tool-integrated RL
Summary: Develops a tool-integrated formal prover that interleaves reasoning, Lean snippets, REPL feedback, and reinforcement learning from proof outcomes.
Easy summary: A Lean prover that learns to use feedback from the proof environment step by step.
Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability
LeanMay 2025hybrid reasoning
Summary: Combines natural-language reasoning with Lean-formalized existence problems so that LLM math answers can benefit from formal proof attempts.
Easy summary: A hybrid method where natural-language math solving is helped by Lean-formal reasoning.
Modeling Tactics as Operators: Effect-Grounded Representations for Lean Theorem Proving
MathNLP 2025Lean tactics
Summary: Represents Lean tactics by the changes they induce on proof states, producing more effect-aware tactic embeddings.
Easy summary: Instead of treating tactics as words, it studies what each tactic does to a proof state.
Synthetic Proofs with Tool-Integrated Reasoning: Contrastive Alignment for LLM Mathematics with Lean
MathNLP 2025Leansynthetic proofs
Summary: Uses Lean-based synthetic proof tasks, tool-integrated reasoning, and contrastive alignment to improve LLM mathematical proving behavior.
Easy summary: Lean-generated proof tasks are used to train models toward better mathematical reasoning.
3.4 Benchmarks/datasets
A Combinatorial Identities Benchmark for Theorem Proving in Lean
LeanCombFeb 2025combinatorics
Summary: Constructs LeanComb, a benchmark for combinatorial identities, and uses an automated theorem generator to create a large enhanced dataset of formal proofs.
Easy summary: A benchmark and data generator focused on combinatorial identities in Lean.
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Lean4May 2025large benchmark
Summary: Introduces a 5,560-problem Lean4 benchmark spanning Olympiad to undergraduate mathematics, built with human-in-the-loop autoformalization and semantic checks.
Easy summary: A large Lean benchmark for testing how well AI systems can prove formal mathematics.
Ineq-Comp: Benchmarking Compositional Reasoning in Automated Theorem Proving on Inequalities
inequalitiesMay 2025benchmark
Summary: Tests whether formal provers understand compositional inequality reasoning by transforming elementary inequalities into multi-step compositional variants.
Easy summary: It checks whether AI provers can combine familiar inequalities in the way humans naturally do.
FormalStep: Step-Level Formal Verification for Mathematical Reasoning
Lean 4Jun 2025step verification
Summary: Part of the Safe framework, FormalStep provides 30,809 formal statements for checking the correctness of individual reasoning steps in generated solutions.
Easy summary: Instead of only checking the final answer, it checks whether each mathematical step is valid.
Lean Meets Theoretical Computer Science
Lean4Aug 2025formal-informal pairs
Summary: Uses theoretical computer science domains such as Busy Beaver and mixed Boolean arithmetic to generate verified formal/informal theorem-proving challenge pairs.
Easy summary: It creates many hard but checkable Lean proof tasks from theoretical computer science.
FormalML: A Benchmark for Formal Subgoal Completion in Machine Learning Theory
Lean4Sep/Oct 2025subgoal completion
Summary: Introduces a benchmark of nearly 5,000 subgoal-completion problems extracted from formalized optimization and probability inequalities in machine-learning theory.
Easy summary: It tests whether AI can fill missing proof steps inside serious ML-theory formalizations.
RLMEval: Evaluating Research-Level Neural Theorem Proving
LeanOct 2025research-level benchmark
Summary: Builds an evaluation suite from real Lean Blueprint formalization projects to test theorem proving and proof autoformalization at research level.
Easy summary: A benchmark drawn from real formalization projects, not just contest-style examples.
FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels
formal algebraNov 2025research-level benchmark
Summary: Introduces FATE-H and FATE-X, formal algebra benchmarks ranging from undergraduate to beyond PhD qualifying-exam difficulty, exposing large gaps in current provers.
Easy summary: A hard algebra benchmark showing that research-level formal theorem proving is still difficult for AI.
IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch
Lean4Nov/Dec 2025human-verified
Summary: Introduces a human-verified Lean 4 benchmark of Indian Mathematics Olympiad problems curated through AI-assisted formalization and expert validation.
Easy summary: A carefully checked benchmark for testing both autoformalization and Lean theorem proving.
MiniF2F-Dafny: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification
DafnyDec 2025benchmark
Summary: Translates miniF2F to Dafny and evaluates LLM-generated proof hints, highlighting a division of labor between LLM guidance and automated verification.
Easy summary: It brings a standard math benchmark into Dafny and tests how much automation plus LLM hints can solve.
FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
benchmarkMar 2026graduate-level
Summary: Benchmark testing whether models can produce Lean 4 proofs for graduate-level mathematics.
Easy summary: A test of whether AI can write proof-assistant proofs for advanced math problems.
Can AI Provers Complete Real-World Lean Theorems? / SorryDB
LeanMar 2026real-world benchmark
Summary: Creates a benchmark from unfinished `sorry`s across real Lean projects, testing whether AI can complete actual formalization gaps.
Easy summary: Instead of artificial tests, it asks AI to fill real missing proofs in Lean projects.
ArXivLean
benchmarkApr 2026research-level
Summary: Benchmark built from recent arXiv papers, translating research-level statements into Lean tasks to reduce contamination and measure current capabilities.
Easy summary: A test where AI tries to prove Lean versions of fresh research-paper statements.
AMBER: A Benchmark for Applied Mathematics in Lean 4
LeanFeb 2026applied math
Summary: Benchmark for constructive applied mathematics tasks requiring algorithms, representation transformations, or explicit computations in Lean.
Easy summary: A benchmark that tests applied-math reasoning, not just pure theorem statements.
LeanCat: A Benchmark Suite for Formal Category Theory
Leancategory theory2026 version
Summary: Benchmark suite targeting formal category theory, a domain with distinctive abstractions and proof patterns.
Easy summary: A Lean benchmark for category-theory proofs.
k-server-bench: Automating Potential Discovery for the k-Server Conjecture
Apr 2026benchmarkdiscovery
Summary: Benchmark in which agents search for potential functions satisfying linear-inequality systems related to the k-server conjecture.
Easy summary: AI is tested on finding the right hidden mathematical object, not just proving a given formula.
HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification
Mar 2026benchmarkautomatic verification
Summary: Benchmark designed to evaluate progress toward discovery-like mathematics with automatically verifiable outcomes.
Easy summary: A test for whether AI is moving toward real mathematical discovery.
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
benchmarkMay 2025Lean combinatorics
Summary: Introduces 100 Lean 4 combinatorics problems across a range of difficulty levels, including IMO-style combinatorics.
Easy summary: A Lean benchmark focused specifically on combinatorial mathematics.
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
benchmarkJun 2025multimodal theorem proving
Summary: Builds a multimodal, multi-level, multi-language theorem-proving benchmark with formalizations in Lean 4, Coq, and Isabelle.
Easy summary: A benchmark for theorem proving when problems include diagrams or visual information.
miniF2F-Lean Revisited: Reviewing Limitations and Misalignments
benchmark analysisNov 2025miniF2F
Summary: Analyzes mismatches between informal and formal statements in miniF2F-Lean and explains why end-to-end performance can be lower than isolated autoformalization/proving scores.
Easy summary: A careful audit showing that benchmark statements themselves can be a major source of difficulty.
Reliable Evaluation and Benchmarks for Statement Autoformalization
benchmark/evaluationEMNLP 2025autoformalization
Summary: Introduces improved metrics and benchmark resources for evaluating statement autoformalization into Lean, including ProofNetVerif.
Easy summary: A benchmark-and-metrics paper for measuring whether autoformalization is actually faithful.
LeanGeo: Formalizing Competition Geometry Problems in Lean
LeanAug 2025geometry benchmark
Summary: Introduces a Lean 4 geometry formalization framework and LeanGeo-Bench for IMO and advanced geometry problems.
Easy summary: A Lean benchmark and library for formalizing geometry problems.
MSC-180: A Benchmark for Automated Formal Theorem Proving from Mathematical Subject Classification
benchmarkDec 2025research breadth
Summary: Creates 180 expert-refined formal verification problems across 60 MSC2020 mathematical branches to test domain breadth of LLM provers.
Easy summary: A benchmark that tests whether provers can handle many branches of mathematics, not just familiar contest areas.
VeriBench-FTP: A Formal Theorem Proving Benchmark in Lean 4 for Code Verification
benchmark2025Lean/code verification
Summary: A Lean 4 benchmark evaluating theorem-proving capabilities on code-verification style problems, complementing pure-math benchmarks.
Easy summary: A benchmark for Lean theorem proving in verified-code settings.
3.5 Discovery
AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery
DeepMindJun 2025evolutionary search
Summary: White paper describing AlphaEvolve, an LLM-guided evolutionary coding agent for improving algorithms, scientific computations, and mathematical constructions through automated evaluation.
Easy summary: An AI system proposes code changes, tests them, and evolves better algorithms or constructions.
LeanConjecturer: Automatic Generation of Mathematical Conjectures for Theorem Proving
Lean4Jun 2025conjecture generation
Summary: Generates university-level mathematical conjectures in Lean using LLMs, rule-based context extraction, and methods aimed at reducing data scarcity in theorem proving.
Easy summary: The system tries to invent useful Lean conjectures that can become training or proof targets.
Lemmanaid: Neuro-Symbolic Lemma Conjecturing
Isabelle/HOLApr 2025lemma discovery
Summary: Uses a fine-tuned LLM to suggest lemma templates and symbolic methods to instantiate them, improving lemma conjecturing in Isabelle/HOL and AFP settings.
Easy summary: It helps discover useful intermediate lemmas, which are often the missing steps in proofs.
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions
LeanMay/Oct 2025neuro-symbolic
Summary: Combines LLM enumeration, pattern-driven conjecturing, and Lean formal proving to solve answer-construction problems; introduces ConstructiveBench.
Easy summary: The system searches for the right mathematical object and then formally proves that it works.
Discovering New Theorems via LLMs with In-Context Proof Examples
Lean4Sep 2025new theorem discovery
Summary: Proposes a conjecturing–proving loop in which LLMs generate new theorem statements and attempt to prove them in Lean 4.
Easy summary: It asks whether an AI can move from proving known theorems to proposing new ones.
Learning Interestingness in Automated Mathematical Discovery
RLNov 2025theory discovery
Summary: Introduces FERMAT, a reinforcement-learning environment for open-ended concept discovery and theorem proving, with an emphasis on learning what makes results interesting.
Easy summary: It tries to teach AI not only to prove things, but to find results worth caring about.
Automated Conjecture Resolution with Formal Verification
Apr 2026Rethlas + ArchonLean
Summary: Two-agent system combining informal reasoning/search with formal Lean verification, claiming to resolve an open commutative algebra problem.
Easy summary: One AI searches for the proof idea; another checks it formally.
Resolution of Erdős Problem #728: a writeup of Aristotle’s autonomous resolution
Jan 2026Erdős problemLean
Summary: Presents a writeup of an AI-assisted/autonomous resolution of an Erdős problem, with formalization in Lean.
Easy summary: A named open problem was reportedly solved with an AI system and checked in Lean.
Agentic Neurosymbolic Collaboration for Mathematical Discovery: A Case Study in Combinatorial Design
Mar 2026combinatorial designLean/symbolic solvers
Summary: Human-AI-symbolic collaboration that produced a new result on imbalance of Latin squares, using symbolic solvers and Lean verification.
Easy summary: Humans, AI, and exact tools worked together to find and verify a new combinatorics result.
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery
Apr 2026Ramsey theorycomputer-assisted
Summary: Case study in combinatorial discovery using computation/AI-assisted conjecture generation and proof support.
Easy summary: Computers help discover and check new facts about Ramsey graphs.
FlowBoost Reveals Phase Transitions and Spectral Structure in Finite Free Information Inequalities
Apr 2026inequalitiesexperimental discovery
Summary: Uses computational/AI-style exploration to study extremal inequality structures and phase transitions.
Easy summary: AI-like computation helps explore complicated inequality landscapes.
Discovering Mathematical Concepts Through a Multi-Agent System
Mar 2026multi-agentconcept discovery
Summary: Multi-agent framework for mathematical concept and conjecture discovery, constraining generation by provability.
Easy summary: Multiple AI agents interact to suggest mathematical ideas that can actually be proved.
Yanasse: Finding New Proofs from Deep Vision’s Analogies
Apr 2026analogyproof discovery
Summary: Attempts proof discovery by transferring proof-strategy patterns across mathematical areas using analogy-like mechanisms.
Easy summary: It tries to find new proofs by recognizing patterns similar to old proofs.
Generating Hadamard Matrices with Transformers
Apr 2026combinatorial construction
Summary: Uses transformer models to generate combinatorial designs related to Hadamard matrices.
Easy summary: AI is used to search for special matrices with difficult combinatorial properties.
Using Reasoning Models to Generate Search Heuristics for Combinatorial Design
discoveryMay 2025combinatorial design
Summary: Uses reasoning LLMs to generate and refine search heuristics for combinatorial design problems, including open instances.
Easy summary: LLMs write search strategies that help solve difficult combinatorial design cases.
A Matter of Interest: Understanding Interestingness of Math Problems in Automated Mathematical Discovery
discoveryNov 2025interestingness
Summary: Studies how to characterize or learn “interestingness” in mathematical problems, a key issue for automated mathematical discovery.
Easy summary: A discovery system must not only find true statements, but also recognize which ones are interesting.
3.6 Formalized mathematics
A Formal Proof of the Irrationality of ζ(3) in Lean 4
LeanFeb/Aug 2025number theory
Summary: Formalizes Beukers’ proof of the irrationality of zeta(3) in Lean 4, adding supporting analytic number theory and shifted Legendre polynomial infrastructure.
Easy summary: A famous irrationality theorem is made machine-checkable in Lean.
On the Averaging Problem of Ideal Families Related to Frankl's Conjecture with Formal Proof by Lean 4
LeanApr 2025combinatorics
Summary: Proves an average-rarity result for ideal set families related to Frankl’s union-closed sets conjecture and verifies the proof in Lean 4.
Easy summary: A combinatorics result related to Frankl’s conjecture is checked in Lean.
Formalizing a classification theorem for low-dimensional solvable Lie algebras in Lean
LeanMay 2025Lie algebras
Summary: Formalizes the classification of solvable Lie algebras of dimension at most three over arbitrary fields, expanding mathlib’s Lie theory and linear algebra infrastructure.
Easy summary: Lean is used to verify a concrete classification theorem from Lie algebra theory.
A Formalization of Divided Powers in Lean
LeanJul 2025commutative algebra
Summary: Formalizes the basic theory of divided powers in Lean 4, including morphisms, sub-divided power ideals, quotients, sums, and supporting power-series infrastructure.
Easy summary: A foundational construction used in algebraic geometry and number theory is formalized in Lean.
Formalizing dimensional analysis using the Lean theorem prover
LeanSep 2025formalized science
Summary: Formalizes dimensions, units, dimensional homogeneity, SI units, constants, the Buckingham Pi theorem, and examples such as the Lennard-Jones potential.
Easy summary: Lean is used to check that scientific equations are dimensionally meaningful.
Formalization of De Giorgi–Nash–Moser Theory in Lean
LeanApr 2026PDE
Summary: Formalizes major PDE regularity results, including local boundedness, weak Harnack inequality, Moser Harnack inequality, and interior Hölder regularity.
Easy summary: A major piece of modern PDE theory is translated into machine-checkable Lean.
A Milestone in Formalization: The Sphere Packing Problem in Dimension 8
LeanApr 2026sphere packing
Summary: Reports a milestone in formalizing Viazovska’s dimension-8 sphere-packing result in Lean, with AI assistance in final stages.
Easy summary: A famous Fields Medal-level result is being made machine-checkable.
A formal proof of the Ramanujan–Nagell theorem in Lean 4
LeanApr 2026number theory
Summary: Formalizes the classical theorem on integer solutions of x² + 7 = 2ⁿ, requiring substantial algebraic number theory infrastructure.
Easy summary: A famous number-theory theorem is fully checked by Lean.
A formalization of the Gelfond–Schneider theorem
LeanMar 2026transcendental number theory
Summary: Formalizes Hilbert’s seventh problem / the Gelfond–Schneider theorem in Lean 4.
Easy summary: Lean checks a deep theorem about transcendental numbers.
A Prime-Generated Formalization of Nagata’s Factoriality Theorem in Lean 4
LeanApr 2026commutative algebra
Summary: Formalizes Nagata’s factoriality theorem and UFD applications in Lean/mathlib with no placeholder axioms.
Easy summary: A serious commutative algebra theorem is now checkable in Lean.
Munkres’ General Topology Autoformalized in Isabelle/HOL
Isabelle/HOLApr 2026topology
Summary: LLM-assisted formalization of Munkres’ general topology, reporting over 85,000 lines, 806 formal results, and zero sorries.
Easy summary: A standard topology textbook is turned into machine-checkable Isabelle/HOL mathematics.
A formal proof of the Sands–Sauer–Woodrow theorem using Rocq and MathComp/SSReflect
Rocq/CoqApr 2026graph theory
Summary: Formalizes a theorem in infinite directed graph theory using Rocq and MathComp/SSReflect.
Easy summary: A graph-theory theorem is checked in the Rocq proof assistant.
Synthetic Differential Geometry in Lean
LeanMar 2026geometry
Summary: Formalizes synthetic differential geometry in Lean/mathlib, including Taylor-style reasoning around infinitesimal neighborhoods.
Easy summary: Lean is used for a nonstandard style of differential geometry.
Formalization of QFT
LeanMar 2026mathematical physics
Summary: Formalizes a foundational result in constructive quantum field theory in Lean 4, with discussion of AI-assisted formalization.
Easy summary: Formal proof assistants are reaching into mathematical physics.
Semi-Autonomous Formalization of the Vlasov–Maxwell–Landau Equilibrium
LeanMar 2026PDE/plasma physics
Summary: Uses AI reasoning and agentic coding with Lean verification under human supervision for a mathematical physics formalization.
Easy summary: AI helps formalize a difficult equation system from plasma physics.
Statistical Learning Theory in Lean 4: Empirical Processes from Scratch
LeanFeb 2026statistical learning theory
Summary: Formalizes empirical-process foundations relevant to statistical learning theory using a human-AI collaborative workflow.
Easy summary: Lean is used to check mathematical foundations of machine learning theory.
Formalizing Gröbner Basis Theory in Lean
LeanFeb 2026computational algebra
Summary: Formalizes polynomial division, Buchberger’s criterion, reduced Gröbner bases, and related algebraic infrastructure.
Easy summary: Lean gets formal versions of central tools from computational algebra.
Formalizing the Classical Isoperimetric Inequality in the Two-Dimensional Case
LeanMar 2026analysis/geometry
Summary: Formalizes Hurwitz’s analytic proof of the planar isoperimetric inequality in Lean 4.
Easy summary: Lean checks the classical theorem that the circle encloses maximal area for fixed perimeter.
A Blueprint for the Formalization of Seymour’s Matroid Decomposition Theorem
Lean blueprintJan 2026matroid theory
Summary: Lays out a dependency blueprint for formalizing Seymour’s matroid decomposition theorem in Lean.
Easy summary: A plan for formalizing a major theorem in matroid theory.
ZFLean: a framework for set-level mathematics in Lean
LeanApr 2026ZFC infrastructure
Summary: Provides infrastructure for doing ZFC-style set-level mathematics inside Lean 4 while preserving some mathlib ergonomics.
Easy summary: A project to make set-theoretic mathematics more natural in Lean.
Bipartite Exact Matching in P
LeanApr 2026algorithmic/combinatorial
Summary: Algorithmic/combinatorial work accompanied by Lean formalization of key proof skeleton and supporting infrastructure.
Easy summary: A new algorithmic proof is backed by machine-checked Lean components.
Hamilton decompositions of the directed 3-torus
LeanMar 2026combinatorics
Summary: Combinatorics paper accompanied by a Lean 4 formalization of the construction/proof components.
Easy summary: A finite combinatorial construction is checked with Lean.
The Chase in Lean — Crafting a Formal Library for Existential Rule Research
LeanApr 2026logic/databases
Summary: Builds a Lean framework for existential rules and the chase procedure, formalizing known results and termination conditions.
Easy summary: Lean is used to verify logical tools important in databases and knowledge representation.
On the Formalization of Network Topology Matrices in HOL
Isabelle/HOLMar 2026network matrices
Summary: Formalizes graph/network matrices such as adjacency, degree, Laplacian, and incidence matrices in HOL.
Easy summary: A formal proof system is used to verify matrix concepts from network theory.
Tutte’s theorem as an educational formalization project
LeanApr 2025graph theory
Summary: Formalizes Tutte’s theorem in Lean and discusses an educational framework for formalization projects.
Easy summary: A key graph-theory theorem is formalized in Lean and used as a learning model.
A Formalization of the Generalized Quantum Stein’s Lemma in Lean
LeanOct 2025quantum information
Summary: Formalizes the Generalized Quantum Stein’s Lemma in Lean, contributing to foundations for quantum information formalization.
Easy summary: A difficult quantum information theorem is made machine-checkable in Lean.
A Modular Lean 4 Framework for Confluence and Strong Normalization
LeanDec 2025metatheory
Summary: Develops a Lean 4 library for confluence and strong normalization of lambda calculi, including several mechanized case studies.
Easy summary: Lean is used to build reusable infrastructure for formalizing programming-language metatheory.
3.8 Human-AI workflows
Safe: Enhancing Mathematical Reasoning in LLMs through Step-Aware Formal Verification
Lean 4Jun 2025verification workflow
Summary: Uses Lean 4 to formalize and verify individual claims in generated mathematical reasoning, aiming to detect hallucinated or invalid proof steps.
Easy summary: It checks an AI’s mathematical solution step by step rather than trusting the final answer.
Advancing Mathematical Research via Human-AI Theorem Proving and Discovery
human-in-the-loopDec 2025scientific computing
Summary: Proposes a human-in-the-loop workflow where LLM outputs serve as raw material for interactive theorem proving, discovery, and rigorous scientific-computing workflows.
Easy summary: It frames AI as a research assistant whose suggestions still need human judgment and formal checking.
A Practical Guide to AI-Assisted Research in Mathematics
guideMar 2026
Summary: Practical advice and examples for mathematicians using AI tools responsibly in research workflows.
Easy summary: A user guide for mathematicians who want to use AI without being misled by it.
A correspondence problem for mathematical proof
conceptualMar 2026
Summary: Analyzes the problem of whether a formal proof actually corresponds to the intended informal proof and theorem.
Easy summary: Even if a proof assistant checks something, we must ensure it checked the right thing.
Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis
LeanApr 2026tool analysis
Summary: Studies which tools help Lean formalization agents: expert drafts, symbol/definition search, compiler feedback, and combinations thereof.
Easy summary: A careful test of what actually helps AI write Lean code.
Lean Atlas: An Integrated Proof Environment for Scalable Human-AI Collaborative Formalization
LeanApr 2026semantic review
Summary: Introduces Lean Atlas and Lean Compass to help humans inspect dependency graphs and semantically review AI-generated formalizations.
Easy summary: It helps humans check whether AI-generated Lean code means what it is supposed to mean.
LeanTutor: Towards a Verified AI Mathematical Proof Tutor
LeanJun 2025education/workflow
Summary: Combines LLM natural-language interaction with Lean verification to prototype a proof tutor that can give mathematically checkable feedback.
Easy summary: A proof tutor that tries to make AI explanations formally checkable.
Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean
Lean2025 versioncopilot
Summary: Framework for running LLM inference from inside Lean, enabling proof automation tools that fit directly into Lean users’ workflow.
Easy summary: An LLM copilot interface built into the Lean theorem-proving workflow.
Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving
verification workflowMay 2025SymPy/Z3
Summary: Introduces MATH-VF, a training-free step-by-step verification framework using a formal language, LLM formalization, and external tools such as SymPy and Z3.
Easy summary: A workflow for checking individual steps in an LLM’s math solution.