Fact-checked by Grok 2 weeks ago

Proof assistant

A proof assistant is an interactive software tool that enables users to formalize mathematical statements, construct rigorous proofs, and mechanically verify their correctness using a formal logical foundation, thereby ensuring reliability in complex reasoning tasks. These systems emerged in the late 1960s with early projects like Automath, developed by Nicolaas de Bruijn and colleagues at Eindhoven University of Technology, which focused on representing mathematical texts in a computer-checkable format to address the growing need for verifiable proofs amid increasing computational involvement in mathematics. Key features include support for dependent type theories or higher-order logics, interactive proof development via tactics and automation, and the de Bruijn criterion, which requires that generated proofs be independently verifiable by a small, trusted kernel program to minimize errors. Proof assistants distinguish themselves from automated theorem provers by emphasizing human-guided of proofs, often incorporating of pre-verified and supporting from proofs via the Curry-Howard , which equates proofs with executable . Prominent examples include , a dependently typed system based on the Calculus of Inductive Constructions, widely used for verifying software and mathematical ; , a generic framework supporting multiple logics like (HOL); , an embeddable prover emphasizing a minimal kernel; , a dependently typed system supporting mathematical formalization with a large called mathlib; and , which employs a declarative style close to natural mathematical language. Historically, proof assistants gained prominence through landmark formalizations, such as the 2005 verification of the originally proved with computer assistance in 1976, and the 2014 completion of the Flyspeck project using HOL Light and Isabelle to confirm Thomas Hales' 1998 proof of the . Today, they play a critical role in fields like (e.g., certifying operating systems and compilers), certified programming, and advancing mathematical knowledge via large-scale libraries such as the Mathematical Components project in or the Isabelle Archive of Formal Proofs. Ongoing developments as of 2025 focus on improving usability, scalability for large proofs, and integration with and to assist in selection and proof generation, broadening their adoption beyond specialists.

Fundamentals

Definition and Purpose

A proof assistant is a software system designed to aid in the construction, verification, and checking of formal mathematical proofs through mechanized reasoning. These tools allow users to encode mathematical statements and proofs in a formal language, where the system rigorously validates each step against a specified logical foundation, ensuring that the proof is correct relative to the axioms and inference rules employed. The core purpose of proof assistants is to mitigate inherent in manual proof-writing, facilitate interactive development of proofs where users guide the process with high-level commands, and support the machine-checked formalization of intricate theorems that might otherwise be prone to subtle mistakes. By providing a trusted for proof validation, these systems enable reliable of mathematical results, which is particularly valuable in domains requiring absolute certainty, such as foundational mathematics, , and . Proof assistants differ from automated theorem provers, which primarily automate the search for proofs with minimal user intervention, by prioritizing user-directed, verifiable proof steps that allow for detailed control and inspection. In contrast to systems, which focus on symbolic manipulation and computation of expressions (e.g., solving equations or simplifying polynomials), proof assistants emphasize logical deduction and proof certification rather than mere algebraic evaluation. These systems trace their origins to early efforts in automated deduction in the late 1960s and 1970s, such as investigations into mechanical proof generation, with more structured interactive frameworks emerging in the 1980s.

Core Components

Proof assistants rely on a modular architecture centered around several key components that facilitate the construction and verification of formal proofs. Many such systems, particularly those in the LCF tradition or based on type theory, feature a kernel as the trustworthy core responsible for proof checking, implementing a minimal set of primitive rules and type-checking algorithms to validate all user-provided content. This small kernel ensures reliability by reducing complex proofs to basic, manually verifiable operations, adhering to the de Bruijn criterion, which requires generating independently checkable proof objects that can be verified by a simple program regardless of the system's complexity. Other systems, such as declarative ones like Mizar, use alternative verification mechanisms without a minimal kernel. In interactive proof assistants, act as procedural commands that guide the proof process by transforming proof goals into subgoals, such as applying lemmas or performing case analysis, allowing users to build proofs interactively without directly manipulating low-level proof terms. Declarative systems instead emphasize writing proofs in a style close to natural mathematical language, with the system checking justifications against a . Libraries provide pre-built collections of theorems, definitions, and axioms, enabling reuse of established mathematical results to accelerate proof development; for instance, Coq's standard libraries include modules like Arith for operations. Proof scripts consist of user-written sequences of commands, including tactic applications and declarations, that orchestrate the overall proof construction in a structured, reproducible manner. In systems with a , the proof verification process hinges on its role in validating large-scale proofs through efficient type checking, ensuring consistency even as proofs grow to megabytes in size. By the de Bruijn criterion, the kernel independently confirms that each proof step adheres to the system's logical rules, minimizing trust assumptions to just this core component and preventing inconsistencies from propagating through higher-level abstractions. This separation allows elaborate user interactions and tactics to be untrusted, as long as they produce verifiable proof objects for kernel inspection. A typical workflow in interactive proof assistants begins with the user stating a in the system's , followed by applying a series of to decompose the goal into simpler subgoals, ultimately reducing it to primitive axioms or admitted facts that the verifies step by step. For example, in , a user might declare a , use the apply to invoke a relevant from a , and then employ intros or destruct to handle variables and cases, with the kernel checking each transformation for type correctness. Upon completion, the proof script is executed to generate a , which the kernel type-checks to confirm validity. Declarative systems follow a different process, where users provide detailed justifications that the system verifies against predefined rules and the . Automation plays a crucial role in handling routine subtasks through built-in procedures for decidable fragments of the underlying logic, such as linear arithmetic or propositional tautologies, which can resolve subgoals without user intervention. Tactics like Coq's auto or omega leverage these decidable solvers to automatically discharge simple obligations, integrating seamlessly with interactive proof construction while still submitting results to kernel verification. Many proof assistants, including those based on dependent type theory, incorporate such mechanisms to balance human guidance with computational efficiency.

Historical Development

Origins in Automated Reasoning

The origins of proof assistants can be traced to foundational developments in mathematical logic and computability theory during the 1930s and 1950s, which established the theoretical limits of formal systems and computation. Kurt Gödel's incompleteness theorems, published in 1931, demonstrated that in any sufficiently powerful consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proved within the system itself, highlighting the inherent incompleteness of axiomatic mathematics. This result underscored the challenges of fully automating mathematical reasoning and influenced later efforts in formal verification by emphasizing the need for systems that could handle undecidable propositions. Complementing Gödel's work, Alan Turing's 1936 paper on computable numbers introduced the concept of a universal computing machine and proved the undecidability of the halting problem, showing that no general algorithm exists to determine whether an arbitrary program will terminate. Alonzo Church's lambda calculus, developed in the early 1930s and applied in his 1936 demonstration of the unsolvability of the Entscheidungsproblem, provided a formal framework for expressing computable functions and higher-order abstractions, serving as a precursor to typed logics used in verification. These contributions collectively laid the groundwork for automated reasoning by revealing the boundaries of mechanical proof and inspiring computational models for logic. In the 1960s, the field advanced toward practical automated theorem proving, driven by efforts to mechanize first-order logic for machine implementation. A pivotal development was J.A. Robinson's introduction of the resolution principle in 1965, a complete and sound inference rule for first-order logic that enabled efficient refutation-based theorem proving by reducing clauses through unification and resolution steps. This method, designed explicitly for computer use, became a cornerstone of automated deduction systems and facilitated the integration of logic into early artificial intelligence programs. Concurrently, early AI logic programs, such as extensions of the 1956 Logic Theorist by Newell and Simon, evolved in the 1960s to explore heuristic search in proof generation, though they often struggled with scalability for non-trivial theorems. These systems marked the shift from theoretical foundations to programmatic attempts at automation, yet they primarily operated in propositional or simple first-order settings, with emerging interest in higher-order logic for more expressive reasoning. The 1970s saw a transition from fully automated provers to semi-automated, interactive approaches, exemplified by the Boyer-Moore theorem prover, which originated in projects at the University of Texas and Xerox PARC starting around 1971. This system emphasized inductive proof techniques and user-guided simplification strategies within a first-order logic framework, allowing human intervention to resolve ambiguities in complex deductions. By focusing on semi-automated deduction, it addressed the practical limitations of pure automation, where exhaustive search often failed due to combinatorial explosion in proof spaces for mathematical theorems. Researchers recognized that fully automatic systems were insufficient for intricate mathematics, as undecidability results and the complexity of real-world proofs necessitated human insight to select lemmas, guide tactics, and manage assumptions, paving the way for interactive proof assistants.

Key Milestones and Systems

The development of proof assistants gained momentum in the with the emergence of early interactive theorem provers that emphasized human-guided proof construction. Earlier, in the late 1960s, the Automath project at University developed the first interactive proof assistant for formalizing mathematical texts. In the , Robin Milner's LCF at the introduced a foundational interactive theorem prover based on , emphasizing a trusted for proof validation. One pivotal system was Nqthm, developed by Robert S. Boyer and J Strother starting in the late and continuing through the at the ; it introduced techniques for program verification and became influential in industrial applications, such as hardware and software correctness proofs. Concurrently, the HOL system, initiated in 1986 by at the , built on the LCF framework to support , enabling more expressive formalizations of and concepts. Entering the 1990s, proof assistants evolved toward more robust logical foundations and user-friendly interfaces. , developed from 1989 at Inria by Christine Paulin-Mohring and others, was based on the Calculus of Inductive Constructions—a theory that allowed for concise definitions of inductive structures and proofs—and quickly became a standard for formalizing complex mathematics. , originating in 1986 under Larry Paulson at the and maturing through the 1990s, adopted an LCF-style architecture with a generic framework for classical and intuitionistic logics, facilitating theorem proving across diverse domains. The and saw broader adoption in academia and industry, with systems tailored for specific verification needs. PVS, released in 1992 by under sponsorship, integrated with automated decision procedures, proving effective for software analysis. , a successor to Nqthm launched in the mid-1990s by Boyer and , focused on industrial-scale and , supporting executable specifications in a with . Agda, introduced in 2007 at Chalmers University, advanced dependently typed programming and proof, drawing from Martin-Löf to enable interactive formalization of functional programs as proofs. In the 2010s and , proof assistants emphasized mathematical formalization and automation enhancements. , developed starting in 2013 by Leonardo de Moura at , targeted large-scale mathematics libraries with its dependent type theory kernel and integration of tactics for efficient proof scripting. During this period, many systems incorporated SAT solvers and other automated tools; for instance, Isabelle's (from 2008 onward) translated goals to external solvers like , significantly boosting proof automation. Key milestones underscored these advancements: the was fully formalized in in 2005 by Georges Gonthier, resulting in approximately 60,000 lines of code. Similarly, the —affirming the densest —was verified in 2014 using HOL Light by the Flyspeck project led by Thomas Hales, confirming Hales' approximately 250-page proof through extensive formal checks.

Logical Foundations

Higher-Order Logic Systems

(HOL) extends by permitting quantification over functions and predicates, treating them as first-class entities alongside individuals. This allows for the direct expression of complex mathematical concepts, such as properties of properties or functions that take other functions as arguments, which are inexpressible in first-order systems. The formal semantics of HOL is grounded in the , where terms are assigned types from a hierarchy starting with base types like bool for propositions and ind for individuals, and built up via function types σ → τ. Logical connectives and quantifiers are encoded as higher-typed constants, with key axioms including (stating that functions are equal if they agree on all inputs), infinity (ensuring the existence of infinite domains, such as the natural numbers), and a choice axiom via the Hilbert ε-operator, which selects an element satisfying a predicate. This framework ensures a conservative extension of classical , preserving its theorems while enabling richer expressiveness without introducing inconsistencies. A core feature of HOL is its support for shallow embeddings of , where domains like numbers, sets, and structures are defined directly using the and lambda abstractions, avoiding deep encodings into a separate meta-language. For instance, sets are represented as predicates of type α → bool, allowing primitive treatment of set-theoretic operations within the logic itself. This approach facilitates concise formalizations of classical . Representative HOL-based proof assistants include HOL Light, which employs a minimal with just 10 primitive rules to enforce , making it ideal for foundational work in . In contrast, Isabelle/HOL provides an extensible framework for building theories, integrating automated tactics and a rich library for higher-order unification and classical proof procedures. The strengths of HOL systems lie in their alignment with classical reasoning, enabling efficient proofs in domains requiring non-constructive methods, such as and . For example, HOL's primitive support for and equality axioms has been leveraged to verify algorithms at scale, demonstrating its practicality for industrial applications. These systems typically incorporate interactive theorem proving to guide users through complex derivations.

Dependent Type Theory Systems

Dependent type theory provides a foundation for proof assistants where types can depend on values, enabling the expression of propositions as types through the Curry-Howard isomorphism. This correspondence, first articulated in the context of typed lambda calculi, identifies proofs of propositions with programs of corresponding types, allowing logical statements to be computationally inhabited. In this framework, introduced by in his , propositions are treated as types, and proofs are terms that inhabit those types, fostering a constructive approach to mathematics where existence implies constructibility. Key features of dependent type theory include the identification of proofs with programs, which supports the extraction of executable code from formal proofs, and inductive types that model recursive data structures such as natural numbers or . Some systems incorporate the univalence axiom from (HoTT), which equates paths (identities) between types with equivalences, enabling higher-dimensional structures that capture homotopy-theoretic interpretations of . To prevent paradoxes like Girard's paradox arising from self-referential , these theories employ cumulative or non-cumulative universe levels, stratifying types into a where each level classifies types of lower levels. is typically handled via identity types, which form a propositional that can be refined in HoTT to include higher paths, or via more intensional variants in core systems. Representative proof assistants based on dependent type theory include , which implements the Calculus of Inductive Constructions (CIC), extending the with inductive definitions to support both proofs and program extraction. Agda realizes Martin-Löf's , emphasizing dependent pattern matching and termination checking for practical programming alongside theorem proving. Lean employs a dependent type theory with a focus on through tactics and a kernel based on CIC-like rules, facilitating large-scale formalizations in and . These systems excel in supporting paradigms and constructive mathematics, as seen in Coq's ability to extract certified algorithms, such as verified implementations of sorting or , directly from proofs. This integration bridges with , ensuring that mathematical correctness translates to runtime guarantees without introducing non-constructive axioms by default.

System Design and Features

Interactive Theorem Proving

Interactive theorem proving represents the core in most proof assistants, where users collaboratively construct formal proofs by guiding the through a series of steps, rather than relying solely on full . In this process, the user begins with a stated , which the system treats as an initial goal, and applies —high-level commands that transform the current proof state into simpler subgoals or resolve them entirely. The proof state encompasses the current goals, hypotheses, and , which the system maintains and updates after each tactic application, allowing users to focus on high-level while the assistant verifies local correctness. This backward reasoning proceeds goal-oriented, starting from the theorem and working towards primitive axioms or admitted facts, enabling the handling of complex, undecidable problems that evade complete automation. Tactics serve as the primary mechanism for goal decomposition, with users choosing between ad-hoc applications for specific steps and scripted sequences for repetitive or structured proofs. Common examples include , which generates subgoals for base cases and inductive steps on recursively defined structures like natural numbers or ; , which replaces terms in goals or hypotheses using equalities or definitions to simplify expressions; and (via destruct in or case_tac in Isabelle), which splits goals based on constructors of inductive types, such as distinguishing empty and non-empty . These tactics can be combined using tacticals, such as sequencing (e.g., applying one after another) or repetition, to build proof scripts that mirror mathematical reasoning while ensuring type-theoretic consistency. For instance, proving list reversal properties often involves followed by and to handle recursive calls. To balance user interaction with computational power, proof assistants integrate external automated solvers, such as solvers (e.g., Z3 or veriT) and equality-matching engines, particularly for proving lemmas in decidable fragments like linear arithmetic or propositional logic. These tools are invoked via dedicated tactics, like in Isabelle, which translates goals to external provers and reconstructs proofs using verifiable certificates checked by the assistant, or bounded invocations in that limit solver exploration to avoid non-termination. This hybrid approach automates routine subproofs while deferring creative decomposition to the user, as seen in verifications where handles quantifier-free constraints but addresses recursive aspects. The interactive paradigm offers key benefits, including the ability to manage undecidable logics like by leveraging human insight for non-trivial choices, such as selecting hypotheses or case splits, which fosters deeper understanding of proof structures. It also promotes reliability, as the system guarantees that only valid steps advance the proof, reducing errors in formalizing intricate theorems like the or . Overall, this user-guided method scales to large formalizations by modularizing proofs into verifiable components, enhancing both mathematical discovery and .

User Interfaces and Tools

Proof assistants offer diverse user interfaces to support interactive proof construction, catering to users ranging from novices to experts. These interfaces emphasize ergonomic interaction, enabling step-by-step verification while managing complex proof states. Text-based environments, integrated with extensible editors like and Vim, form a foundational approach for many systems. Proof General, an -based framework, provides a unified interface for assistants such as and Isabelle, featuring , script parsing, and real-time feedback on proof progress. Agda's Emacs mode similarly facilitates interactive type-checking and goal refinement through structured editing and Unicode input methods. Dedicated IDEs enhance usability with graphical elements tailored to proof development. CoqIDE serves as Coq's native standalone interface, offering windows for script editing, goal inspection, and query evaluation, with support for and error localization. Isabelle/jEdit, built on the jEdit editor, delivers asynchronous processing, markup rendering, and plugin extensibility for seamless proof exploration. For , the VS Code extension acts as the primary IDE, incorporating features like hover information and code navigation to streamline stating and application. Visualization tools play a crucial role in presenting proof dynamics clearly. Goal displays, common across interfaces, show current subgoals, hypotheses, and context, as seen in Proof General's dedicated goals buffer that updates incrementally during execution. highlighting identifies or logical issues with visual cues, reducing time. Specialized visualizers, such as Prooftree for in Proof General, render proof trees as layered diagrams, with nodes representing tactics and branches indicating subgoal evolution—fully proved paths appear in green for at-a-glance verification. Auxiliary tools augment core interfaces with practical utilities. Proof state debuggers, like Coq's interactive stepping mode, allow users to inspect and retract commands mid-proof, aiding in error diagnosis. Version control integration leverages editor capabilities, such as Git hooks in VS Code or packages, to track proof script changes collaboratively. Export functions enable documentation generation; Coq's coqdoc utility converts scripts to or , producing polished PDFs for sharing formalized results. To improve accessibility, especially in educational settings, notebook-style interfaces promote exploratory learning. Lean's lean4-jupyter kernel integrates with Jupyter notebooks, allowing users to interleave code execution, theorem proofs, and visualizations in a format. User interfaces have evolved from rudimentary command-line prompts to immersive, web-accessible platforms. Initial systems depended on terminal interactions for input and output, but contemporary developments like jsCoq compile to , providing a browser-native with scratchpad editing, key bindings for proof stepping, and embeddable demos—facilitating remote collaboration without installation.

Comparison Across Systems

Feature and Capability Overview

Proof assistants vary significantly in their core features and capabilities, reflecting their underlying logical foundations and design goals. Major systems such as , Isabelle/HOL, , and HOL4 support distinct logics—ranging from intuitionistic type theories to classical higher-order logics—while providing tactic-based proof construction, varying degrees of automation, and mechanisms for code extraction and proof reuse. These differences influence their suitability for formalizing , verifying software, or developing readable proofs. The following table summarizes key comparison criteria across these systems, highlighting supported logics, proof languages, automation, libraries, definition mechanisms, extraction, expressiveness, and reuse.
SystemSupported LogicTactic LanguageAutomation LevelsLibrary Size/NameLanguage for DefinitionsExtraction CapabilitiesTheorem ExpressivenessProof Reuse via Modules
CoqCalculus of Inductive Constructions (intuitionistic)Ltac (procedural)High (e.g., auto, omega for arithmetic)Large (standard library + contrib, thousands of theorems)Gallina (functional)Strong to OCaml, Haskell, SchemeHigh via dependent types for precise specificationsModules and functors for parameterized proofs
Isabelle/HOLClassical higher-order logicIsar (declarative), ML-basedVery high (Sledgehammer integrates external ATPs)Extensive (Archive of Formal Proofs, 934 entries as of November 2025)HOL (functional)Code generation to Scala, Haskell, SMLSupports classical axioms like Peirce's law nativelyLocales and theories for modular extensions
LeanDependent type theory (CIC with quotients, choice)Native tactics (e.g., simp, rw)High (simp, linarith, ring; typed tactic framework)Very large (mathlib: ~120,000 definitions, ~240,000 theorems as of November 2025)Lean (functional with dependent types)To C, JavaScript; focuses on verified executablesUniverse polymorphism for scalable hierarchiesType classes and sections for reusable abstractions
HOL4Classical higher-order logicML-based tactics (e.g., REWRITE_TAC)Moderate (HOLyHammer for premise selection, Metis)Moderate (standard library for analysis, sets; ~7,000 theorems)ML (functional)To SML, OCaml via embeddingsBalanced for automation in classical settingsTheories and lemmas for incremental building
Coq exemplifies strong extraction capabilities, allowing verified functional programs to be compiled from proofs, such as extracting a certified extraction of the solver to . Isabelle/HOL emphasizes readable proofs through , enabling structured, human-verifiable arguments that resemble natural language derivations. Lean's mathlib library prioritizes comprehensive mathematical formalization, supporting proof reuse via type classes that automatically resolve instances in algebraic structures. HOL4 facilitates proof reuse through its theory hierarchy, where new developments extend base logics without compromising soundness. These features enable theorem statement expressiveness tailored to their logics, such as Coq's dependent types for encoding program specifications directly in types.

Formalization Scale and Performance

Proof assistants have enabled formalizations of increasing scale, with major libraries comprising hundreds of thousands of definitions and theorems. For instance, Lean's mathlib library, a comprehensive mathematical , contains ~120,000 definitions and ~240,000 theorems as of November 2025. Similarly, Coq's ecosystem, including its and contributed packages, contains extensive formal developments across various domains, reflecting the cumulative efforts of its community. Isabelle/HOL's Archive of Formal Proofs (AFP) hosts 934 entries as of November 2025, each often spanning thousands of lines of formal developments, covering topics from algebra to verified software. Performance in proof assistants is evaluated through metrics such as proof checking time, memory usage, and support for parallelism, which are critical for handling large-scale libraries. Isabelle/HOL incorporates multi-threading for parallel proof processing, allowing efficient exploitation of multi-core hardware and reducing overall verification time for extensive theories. In , proof checking for substantial developments can require significant computational resources, with benchmarks showing that unoptimized large libraries may take hours to verify fully, though targeted optimizations can yield improvements of several orders of magnitude in runtime. Memory usage similarly scales with library size; for example, verifying complex sessions in Isabelle may demand gigabytes of , influenced by the depth of proof dependencies. Benchmarks for kernel verification highlight efficiency differences across systems. HOL Light's compact kernel, consisting of around 400 lines of code, enables rapid verification—often in seconds—even for intricate formalizations, prioritizing a minimal . Scalability challenges in large libraries include prolonged edit-check cycles and dependency resolution overheads, where modifying a foundational can necessitate re-verifying thousands of dependent proofs, exacerbating time and resource demands. Key factors affecting performance include kernel size and optimization techniques. Smaller kernels, as in HOL Light, enhance trustworthiness by limiting the code subject to manual audit but may require more user effort in proofs; larger kernels in systems like offer richer automation at the cost of extended verification times. Techniques such as caching—evident in 's compiled .vo files that store pre-verified modules and Isabelle's session graphs for incremental rebuilding—mitigate recomputation, significantly accelerating iterative development in expansive libraries.

Applications and Impact

Notable Formalized Theorems

One of the landmark achievements in proof assistants is the formalization of the Four Color Theorem, which states that any planar graph can be colored with at most four colors such that no two adjacent vertices share the same color. This theorem was mechanized in the Coq proof assistant in 2005 by Georges Gonthier at Microsoft Research, with contributions from Benjamin Werner, building on the 1997 informal proof by Robertson, Sanders, Seymour, and Thomas. The formalization spanned several years of effort, involving approximately 60,000 lines of Coq proof scripts, and required refactoring extensive parts of the original proof to fit Coq's constructive framework, including the development of libraries for hypermaps and graph theory. The verification process took three days on contemporary hardware and provided mechanical assurance against errors in the original computer-assisted case analysis, which had previously relied on unverified software; this formal proof eliminated trust in external code and enabled independent reproducibility. Another significant formalization is that of the , asserting that the densest packing of equal spheres in three-dimensional has a density of \pi / \sqrt{18}, achieved by the face-centered cubic lattice. The Flyspeck project, led by Thomas Hales, completed this in 2014 using primarily the HOL Light proof assistant, with parts verified in Isabelle for auxiliary results on tame graphs. The effort involved a team of over 20 mathematicians and computer scientists across multiple institutions, spanning more than a decade from 2003, culminating in about 120,000 lines of ; it addressed the original 1998 informal proof's reliance on extensive computer casework by formalizing both the and discrete classification components. This mechanization confirmed the conjecture's validity beyond doubt, demonstrating proof assistants' capacity to handle hybrid human-computer mathematical reasoning and yielding reusable libraries in and geometry. The Feit-Thompson Theorem, also known as the Odd Order Theorem, proves that every of odd order is solvable, a cornerstone in the . A complete formalization in was achieved in 2012 by a collaborative team of 13 researchers, including Georges Gonthier and Andrea Asperti, following a six-year effort starting around 2006. The project produced over 150,000 lines of code, with the core proof comprising about 40,000 lines, and developed foundational libraries in , linear algebra, and within the Mathematical Components framework. This mechanization, constructive and axiom-free beyond , verified the 255-page original 1963 proof, uncovering minor gaps and providing a reusable basis for further formalizations, thus enhancing reliability in . In recent developments, the Prime Number Theorem—which states that the number of primes up to x is asymptotically x / \log x—has been targeted for formalization in the Lean proof assistant during the 2020s. A project announced in January 2024, led by Terence Tao and Alex Kontorovich, mechanizes elementary, analytic, and Fourier-based proofs, involving a collaborative team of mathematicians and building on Lean's mathlib library for number theory; as of November 2025, it has formalized significant results in analytic number theory. Concurrently, an AI-assisted formalization of the strong Prime Number Theorem (with error term) was completed in early 2024 by Math, Inc., using the Gauss autoformalization agent, generating approximately 25,000 lines of verified Lean code over 1,000 interconnected theorems and definitions in three weeks by a small expert team after developing complex analysis infrastructure. These efforts underscore the growing integration of AI in proof assistants to accelerate formal verification of analytic number theory, offering benefits in error detection and foundational consistency for prime distribution studies.

Broader Uses in Verification and Mathematics

Proof assistants extend beyond pure mathematical formalization to practical applications in software and hardware verification, where they ensure correctness and safety in complex systems. A prominent example is CompCert, a formally verified optimizing compiler for a subset of the C programming language, developed using the Coq proof assistant. Initiated in 2005 and detailed in a 2009 publication, CompCert includes a machine-checked proof that its compilation passes preserve the semantics of the source code, compiling Clight—a large subset of C99—to assembly for architectures like PowerPC and ARM, thereby guaranteeing that safety properties of the source hold in the executable. Similarly, the seL4 microkernel, verified in 2009 using Isabelle/HOL, represents the first general-purpose operating system kernel with a comprehensive machine-checked proof of functional correctness from an abstract specification to its C implementation, encompassing 8,700 lines of C code and ensuring no undefined behaviors or crashes under stated assumptions. In hardware verification, proof assistants like and HOL have been instrumental in certifying processor components and circuit designs. was employed by starting in the late 1990s to verify the floating-point adder of the processor, using a custom register-transfer level (RTL) modeling language translated into for mechanical proofs of compliance, demonstrating scalability to industrial designs with thousands of gates. HOL theorem provers, such as HOL4, support verification of combinational and sequential circuits through libraries that enable refinement proofs between behavioral and gate-level implementations, as illustrated in a 2017 framework for generic circuit verification that automates much of the proof process while handling complex arithmetic operations. Ongoing projects leverage proof assistants to formalize advanced concepts collaboratively. The Liquid Tensor Experiment, conducted in the early using , successfully formalized key results from perfectoid spaces and vector spaces, culminating in a complete machine-checked proof in July 2022 that verified a by , advancing the formalization of p-adic and demonstrating Lean's capacity for handling intricate analytic arguments. Community efforts like Formal Abstracts aim to bridge informal mathematical literature with formal proofs by developing a language to parse and formalize abstracts into systems like Lean, fostering reusable formal libraries through collaborative of theorems and definitions since 2018. Proof assistants also serve educational purposes, particularly in teaching logic and interactively. They provide immediate feedback on proof construction, helping students grasp concepts like and quantifiers through hands-on , as evidenced by surveys of undergraduate courses where tools like and enhance understanding of formal reasoning without requiring advanced programming skills. Industrial adoption highlights their reliability for mission-critical systems. employs the PVS proof assistant for verifying algorithms and protocols in and space software, supporting safety analyses in projects like aircraft control systems through its specification and theorem-proving capabilities. , through its Research division, develops and applies for large-scale of mathematical foundations, including potential extensions to computational aspects of systems, as part of efforts to build verified libraries exceeding one million lines of code.

Challenges and Future Directions

Current Limitations

Proof assistants, while powerful for , face significant barriers that impede their adoption beyond specialized communities. The steep stems primarily from the need to master domain-specific tactics and formal languages, which often diverge from intuitive and require understanding complex theoretical concepts like dependent types. For instance, users frequently encounter unfamiliar choices, such as Unicode-based syntax and whitespace sensitivity in systems like Agda, leading to confusion and frustration during initial engagement. Additionally, inadequate tool exacerbate these issues, with unclear error messages and buggy interfaces hindering effective and exploration. Surveys of novice users, including students, highlight these obstacles as major deterrents, with mean severity ratings exceeding 4.5 on a 7-point scale for design weirdness and ecosystem deficiencies. Scalability challenges further limit the practicality of proof assistants in handling extensive formalizations. Proof becomes particularly burdensome in evolving libraries, where updates to definitions or lemmas can propagate , causing distant proofs to fail without clear indications of the root cause. In large-scale projects, such as the L4.verified with approximately 390,000 lines and 22,000 lemmas, the edit-check cycle can extend to hours or even days, dominated by re-verification of unaffected components due to non-local dependencies. This is compounded by the difficulty in refactoring proofs for or , as moving lemmas often requires manual reconstruction amid opaque automation contexts. Such issues have been observed in verification efforts like Verisoft, involving over 500,000 lines, where phases significantly outlast initial development. Trust and completeness remain core concerns, as the reliability of proof assistants hinges on the correctness of their trusted kernels. Users must place faith in these small, verified cores—such as 's or Nuprl's—to accurately check proofs, yet historical bugs, including 's 2013 termination analysis flaw and 2015 inductive type computation error, have enabled invalid proofs of falsehood. Moreover, gaps persist in formalizing informal , where translating arguments into rigorous syntax reveals ambiguities or incompletenesses not apparent in pen-and-paper proofs. Cross-verification efforts, like embedding Nuprl's rules in , mitigate some risks but underscore the inherent limitations of kernel reliance, as absolute self-consistency is impossible for expressive logics due to incompleteness theorems. Resource demands pose another substantial hurdle, with high computational costs impeding work on large proofs. In systems like , operations such as rewriting exhibit exponential relative to input size, frequently leading to out-of-memory failures during of substantial developments. Proof-term elaboration incurs overhead from substitutions and let-bindings, while repeated imports and notation overloads amplify slowdowns, making full checks of mature libraries resource-intensive. Lean faces analogous issues, lacking robust performance guarantees that could ensure scalability for real-world applications. Finally, gaps in coverage restrict proof assistants' applicability to certain mathematical domains, notably continuous mathematics and probability. Formalizing real analysis reveals limitations in handling infinite structures, with axiomatic approaches in Coq and PVS risking inconsistencies and constructive methods in C-CoRN forgoing classical principles like the excluded middle. Systems like HOL Light struggle with intuitive constructions of real numbers via sequences, while Isabelle/HOL's Cauchy sequences lack seamless ties to Dedekind cuts, complicating continuity and differentiability proofs. Probability theory faces similar hurdles, as measure-theoretic formalizations remain underdeveloped compared to discrete domains, with incomplete libraries for stochastic processes and integration limiting comprehensive verifications. These deficiencies highlight the need for enhanced automation in non-discrete areas, where current tools fall short in supporting advanced analyses. Recent advancements in proof assistants are increasingly incorporating artificial intelligence to enhance automation and usability. Machine learning techniques for tactic suggestion have gained prominence, exemplified by Tactician, a plugin for the Coq proof assistant that employs online learning algorithms to recommend and apply tactics interactively during proof development. This approach allows users to retain control while benefiting from AI-driven guidance, with evaluations showing improved proof efficiency on Coq benchmarks. Similarly, hybrid systems integrating large language models (LLMs) with Lean have emerged, such as LeanDojo, which facilitates training LLMs on Lean proofs to generate and verify theorems autonomously. These integrations, building on frameworks like CoqGym, enable iterative proof refinement by combining informal LLM reasoning with formal verification. New foundational paradigms are also shaping proof assistant development. Extensions to cubical type theory in Agda, introduced in its CubicalTT mode, provide computational interpretations of , enabling univalent foundations for synthetic and . Ongoing refinements in the , including automated boundary filling for inductive families, address scalability issues in higher-dimensional proofs. In parallel, proof assistants are adapting to specialized domains like , with systems such as Isabelle/HOL supporting of quantum circuits and protocols through extensions like the Quantum Hoare Logic library. Community-driven initiatives are fostering greater and scale in formal . Efforts to standardize translations between systems, such as exporting libraries from HOL4 to Isabelle/HOL via tools like OpenTheory, enable reuse of formalized content across provers. Large-scale formalization projects, including the UniMath library in , advance univalent by formalizing and up to significant portions of foundational texts. Complementary endeavors, like the FormalGL project for group law formalizations in , contribute to modular libraries for algebraic structures. These collaborations align concepts across diverse assistants, reducing duplication and accelerating theorem libraries. As of 2025, there is heightened emphasis on verifying models using proof assistants, with initiatives developing provable safety guarantees for neural networks through formal semantics in systems like and Isabelle. Web-accessible provers are proliferating, allowing browser-based interaction with tools like and via platforms such as the Lean Web Editor, democratizing access for education and collaboration. Looking ahead, automation in proof assistants is progressing toward human-level proficiency for routine , with -driven methods generating millions of verified theorems via exploration of proof spaces. For example, DeepMind's AlphaProof achieved silver medal performance at the 2024 by solving complex problems in . Integration with proof mining techniques promises to extract constructive bounds and quantitative insights from classical proofs, bridging with applied analysis.

References

  1. [1]
    [PDF] Proof Assistants: history, ideas and future
    Proof assistants are computer systems that allow a user to do mathematics on a computer, but not so much the computing (numerical or symbolical) aspect of ...
  2. [2]
    [PDF] An Introduction to Proof Assistants - ETH Zürich
    Definition: A proof assistant satisfies the de Bruijn criterion if it gen- erates proofs that can be checked (independently of the system that created it) ...
  3. [3]
    [PDF] Introduction to the Coq proof-assistant for practical software verification
    The Coq ar- chitecture is also based on a small trusted kernel, making possible to use third-party libraries while being sure that proofs are not compromised. 1 ...
  4. [4]
    de Bruijn criterion - PLS Lab
    The de Bruijn criterion is an "architectural" principle that is used in software that implements logic, e.g. proof assistants. Logics are usually implemented in ...
  5. [5]
    [PDF] ON COMPUTABLE NUMBERS, WITH AN APPLICATION TO THE ...
    By A. M. TURING. [Received 28 May, 1936.—Read 12 November, 1936.] The "computable" numbers may be described briefly ...
  6. [6]
    [PDF] An Unsolvable Problem of Elementary Number Theory Alonzo ...
    Mar 3, 2008 · The purpose of the present paper is to propose a definition of effective calculability which is thought to correspond satisfactorily to the ...
  7. [7]
    A Machine-Oriented Logic Based on the Resolution Principle
    A Machine-Oriented Logic Based on the Resolution Principle. Author: J. A. Robinson ... This paper focuses on resolution-based automated reasoning theory in ...
  8. [8]
    [PDF] The Automation of Proof: A Historical and Sociological Exploration
    This article reviews the history of the use of computers to auto- mate mathematical proofs. It identifies three broad strands of work:.
  9. [9]
    [PDF] HISTORY OF INTERACTIVE THEOREM PROVING
    The proof assistant, known as Stanford LCF [Milner, 1972], was intended more for applications in computer science rather than mainstream pure mathematics. Al-.
  10. [10]
    Second-order and Higher-order Logic
    Aug 1, 2019 · It is stronger than first order logic in that it incorporates “for all properties” into the syntax, while first order logic can only say “for ...
  11. [11]
    [PDF] HOL Light: an overview
    HOL Light is an interactive proof assistant for classical higher- order logic, intended as a clean and simplified version of Mike Gordon's original HOL system.
  12. [12]
    Church's type theory - Stanford Encyclopedia of Philosophy
    Aug 25, 2006 · The system outputs proof terms which are accepted as proofs (after the addition of a few definitions) by the Coq proof assistant. The prover ...
  13. [13]
    [PDF] tutorial.pdf - Isabelle
    This volume is a self-contained introduction to interactive proof in higher- order logic (HOL), using the proof assistant Isabelle.
  14. [14]
    [PDF] More Reasons Why Higher-Order Logic is a Good Formalism for ...
    The main purpose of this paper is to argue that higher-order logic, compare to less expressive formalisms such as first-order logic, is a very good formalism ...Missing: strengths | Show results with:strengths
  15. [15]
    [PDF] Why higher-order logic is a good formalism for specifying and ...
    Why higher-order logic is a good formalism for specifying and verifying hardware.<|separator|>
  16. [16]
    [PDF] Curry-Howard Isomorphism
    The Curry-Howard isomorphism states an amazing correspondence between systems of formal logic as encountered in proof theory and computational.
  17. [17]
    [PDF] Intuitionistic Type Theory
    Intuitionistic Type Theory. Per Martin-Löf. Notes by Giovanni Sambin of a series of lectures given in Padua, June 1980. Page 2. Page 3. Contents. Introductory ...Missing: seminal | Show results with:seminal
  18. [18]
    [PDF] Homotopy Type Theory: Univalent Foundations of Mathematics
    On the one hand, there is Voevodsky's subtle and beautiful univalence axiom. The univalence ax- iom implies, in particular, that isomorphic structures can ...
  19. [19]
    [PDF] The Calculus of Constructions - Hal-Inria
    PAPER RÉCUPÉRÉ ET RECYCLÉ . Page 3. The Calculus of Constructions. Thierry Coquand and Gérard Huet. INRIA. Introduction. The Calculus of Constructions is a ...
  20. [20]
    [PDF] Introduction to the Calculus of Inductive Constructions - Hal-Inria
    In this paper, we give a quick overview of the Calculus of Inductive Constructions, the formalism behind the Coq proof assistant. In section 2, we present the ...
  21. [21]
    [PDF] Towards a practical programming language based on dependent ...
    Dependent type theories have been around since the early 1970's, when. Martin-Löf introduced his intuitionistic theory of types [ML72]. The original motivation ...
  22. [22]
    [PDF] The Lean Theorem Prover (system description)
    Abstract. Lean is a new open source theorem prover being developed at Microsoft Research and Carnegie Mellon University, with a small.
  23. [23]
    [PDF] Filip Maric A SURVEY OF INTERACTIVE THEOREM PROVING
    In this paper, we present a survey of the third approach, embod- ied in modern interactive theorem provers (ITP), also called proof- assistants. These tools ...
  24. [24]
    [PDF] Cooperative Integration of an Interactive Proof Assistant and an ...
    The most important properties of the Coq proof assistant for our work are that it produces proof terms in a small core calculus and that it features an ...
  25. [25]
    Proof General
    Proof General is a generic interface for proof assistants (also known as interactive theorem provers), based on the extensible, customizable text editor Emacs.Download · Screenshots · Links · Features
  26. [26]
    Emacs Mode — Agda 2.9.0 documentation
    The Agda Emacs mode comes with an input method for easily writing Unicode characters. Most Unicode character can be input by typing their corresponding TeX/ ...
  27. [27]
    Welcome to a World of Rocq
    ### Summary of User Interfaces and Tools for Coq (Rocq Prover)
  28. [28]
    Isabelle
    ### Summary of User Interfaces for Isabelle
  29. [29]
  30. [30]
    Proof General - NonGNU ELPA
    The goals buffer displays the current list of subgoals to be solved. The response buffer displays other output from the proof assistant. By default, only two of ...
  31. [31]
    Proof tree visualization for Proof General
    Feb 23, 2023 · Prooftree is a program for proof-tree visualization during interactive proof development in a theorem prover. It is currently being developed for Coq and Proof ...
  32. [32]
    [PDF] The Coq Proof Assistant Reference Manual - Yale FLINT Group
    Apr 4, 2013 · COQ also provides an interactive proof assistant to build proofs using specific programs called tactics. All services of the COQ proof ...
  33. [33]
    Documenting Coq files with coqdoc - Rocq
    Set --inputenc utf8x for LaTeX output and--charset utf-8 for HTML output. Also use Unicode replacements for a couple of standard plain ASCII notations such ...
  34. [34]
    lean4-jupyter - PyPI
    Nov 11, 2024 · The kernel can: execute Lean 4 commands (including definitions, theorems, etc.) execute Lean 4 tactics with magic like % proof immediately after ...Missing: assistant | Show results with:assistant<|separator|>
  35. [35]
    jsCoq – Use Coq in Your Browser
    jsCoq is an interactive, web-based environment for the Coq Theorem prover, and is a collaborative development effort.
  36. [36]
  37. [37]
    [PDF] Interactive Theorem Proving (ITP) Course
    ▻ the produced function and the definitional theorem might be different. ▻ in simple examples, quantifiers added. ▻ pattern compilation takes place.
  38. [38]
    Mathlib statistics - Lean community
    A visualization showing how the various topics in mathlib interact and their relative sizes can be found here.
  39. [39]
    Comparison of Coq and Matita code size per functionality.
    ... overall size of the code of Matita is about 65.000 lines of code, against the 166.000 lines of Coq. A refinement of this estimation is given in Figure 5 ...
  40. [40]
    Entries - Archive of Formal Proofs
    Entries · 2025 · 2024 · 2023 · 2022 · 2021 · 2020 · 2019 · 2018 ...
  41. [41]
    [PDF] The Isabelle/Isar Implementation
    To make shared-memory multi-threading work robustly and efficiently, some programming guidelines need to be observed. While the ML system is re- sponsible to ...
  42. [42]
    [PDF] Performance Engineering of Proof-Based Software Systems at Scale
    This dissertation aims to be a partial guide to identifying and resolving performance bottlenecks in dependently typed tactic-driven proof assistants like Coq.
  43. [43]
    [PDF] Scaling Isabelle Proof Document Processing - Makarius Wenzel
    This is a study of performance requirements, technological side-conditions, and possibilities for scal- ing of formal proof document processing in Isabelle ...
  44. [44]
    [PDF] Towards self-verification of HOL Light
    HOL. Light is constructed on top of a logical kernel consisting of only around 400 lines of Objective CAML. Thus, if we accept that the interface to the trusted ...
  45. [45]
    [PDF] Challenges and Experiences in Managing Large-Scale Proofs
    Abstract. Large-scale verification projects pose particular challenges. Issues include proof exploration, efficiency of the edit-check cycle, and.
  46. [46]
    [PDF] A computer-checked proof of the Four Colour Theorem 1 The story
    This report gives an account of a successful formalization of the proof of the Four. Colour Theorem, which was fully checked by the Coq v7.3.1 proof ...
  47. [47]
    A FORMAL PROOF OF THE KEPLER CONJECTURE
    May 29, 2017 · This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants.
  48. [48]
    [PDF] A Machine-Checked Proof of the Odd Order Theorem
    This paper reports on a six-year collaborative effort that cul- minated in a complete formalization of a proof of the Feit-Thompson Odd. Order Theorem in the ...
  49. [49]
    Computer-checked mathematics: a formal proof of the odd order ...
    Jul 14, 2014 · The Odd Order Theorem is a landmark result in finite group theory ... We present a mechanised formalisation, in Isabelle/HOL, of ...
  50. [50]
    Introducing Gauss, an agent for autoformalization - Math, Inc.
    Using Gauss, we have completed a challenge set by Fields Medallist Terence Tao and Alex Kontorovich in January 2024 to formalize the strong Prime Number Theorem ...
  51. [51]
    [PDF] Formal verification of a realistic compiler - Xavier Leroy
    This paper gives a high-level overview of the CompCert compiler and its mechanized verification, which uses the Coq proof assistant [7, 3]. This compiler, ...
  52. [52]
    [PDF] seL4: Formal Verification of an OS Kernel - acm sigops
    seL4 is a formally verified, general-purpose OS kernel, the first of its kind, designed for functional correctness and is a member of the L4 microkernel family.
  53. [53]
    The Floating Point Adder of the AMD Athlon TM Processor
    Jun 18, 2002 · A Case Study in Formal Verification of Register-Transfer Logic with ACL2: The Floating Point Adder of the AMD Athlon TM Processor. Conference ...
  54. [54]
    A Library for Combinational Circuit Verification Using the HOL ...
    Interactive theorem provers can overcome the scalability limitations of model checking and automated theorem provers by verifying generic circuits and ...<|separator|>
  55. [55]
    Completion of the Liquid Tensor Experiment | Lean community blog
    Jul 15, 2022 · We are proud to announce that as of 15:46:13 (EST) on Thursday, July 14 2022 the Liquid Tensor Experiment has been completed.
  56. [56]
    [PDF] Formal Abstracts - Floris van Doorn
    Aug 24, 2018 · Devise a formal language for Formal Abstracts which can be parsed into various proof assistants. Initially we will use Lean. In a small team ...
  57. [57]
    [2505.13472] Proof Assistants for Teaching: a Survey - arXiv
    May 9, 2025 · This paper surveys previous work related to the use of proof assistants for (mostly undergraduate) teaching.
  58. [58]
    Theorem Proving - NASA Langley Formal Methods
    Mar 12, 2024 · NASA Langley supports a PVS Research effort that aims at the advancement of theorem proving techniques for complex safety-critical applications.
  59. [59]
    Lean - Microsoft Research
    Lean is eliminating the bottleneck by digitizing mathematics and enabling computers to verify mathematical theorems. We are building the platform for the next ...Downloads · Publications · People · Groups
  60. [60]
    [PDF] Pinpointing the Learning Obstacles of an Interactive Theorem Prover
    Interactive theorem provers have a steep learning curve. Obstacles include confusion about their role, design choices, and tool deficiencies. Obstacles are ...
  61. [61]
    [PDF] TRUST IN PROOF ASSISTANTS - Cornell eCommons
    This thesis addresses the challenge of building highly trust- worthy proof assistants. First, we present an argument for the impossibility of building an ...
  62. [62]
    [PDF] Prototyping a Scalable Proof Engine - DSpace@MIT
    May 9, 2025 · Several studies have highlighted the challenges of scaling proof assistants such as Coq and Lean, especially in the context of large, real-world ...
  63. [63]
    [PDF] Formalization of Real Analysis: A Survey of Proof Assistants and ...
    In this survey, we focus on properties related to real analysis: real numbers, arithmetic operators, limits, differentiability, integrability, and so on. We ...
  64. [64]
    A Seamless, Interactive Tactic Learner and Prover for Coq - arXiv
    Jul 31, 2020 · We present Tactician, a tactic learner and prover for the Coq Proof Assistant. Tactician helps users make tactical proof decisions while they retain control ...
  65. [65]
    The Tactician
    Tactician is a tactic learner and prover for the Coq Proof Assistant. The system will help users make tactical proof decisions while they retain control over ...
  66. [66]
    Cubical agda: a dependently typed programming language with ...
    This paper describes an extension of the dependently typed functional programming language Agda with cubical primitives, making it into a full-blown proof ...
  67. [67]
    [2402.12169] Automating Boundary Filling in Cubical Type Theories
    Feb 19, 2024 · When working in a proof assistant, automation is key to discharging routine proof goals such as equations between algebraic expressions.<|separator|>
  68. [68]
    Experiences from Exporting Major Proof Assistant Libraries - arXiv
    May 5, 2020 · We translated the libraries of multiple proof assistants, specifically the ones of Coq, HOL Light, IMPS, Isabelle, Mizar, and PVS into a universal format: ...Missing: UniMath FormalGL
  69. [69]
    Aligning concepts across proof assistant libraries - ScienceDirect.com
    We evaluate the approach on the libraries of six proof assistants based on different logical foundations: HOL4, HOL Light, and Isabelle/HOL for higher-order ...
  70. [70]
    AI, Proof, and the Future: Five Questions with Tanner Duve
    Jul 14, 2025 · Provable AI systems are crucial for safety, privacy, and fairness. AI can aid in formal verification, and formal verification can verify AI ...
  71. [71]
    (PDF) Proof assistants: History, ideas and future - ResearchGate
    Aug 9, 2025 · PDF | In this paper I will discuss the fundamental ideas behind proof assistants: What are they and what is a proof anyway?<|control11|><|separator|>