Formal proof
A formal proof is a finite sequence of well-formed formulas derived within a formal deductive system, where each formula is either an axiom or follows from preceding formulas according to specified rules of inference, culminating in the theorem to be established.[1] This structure ensures that the proof is mechanically verifiable and guarantees the theorem's validity relative to the system's axioms, providing a rigorous foundation for logical and mathematical reasoning.[2] Formal proofs are central to proof theory, a branch of mathematical logic that investigates the properties of deductive systems, including soundness (every provable statement is true) and completeness (every true statement is provable).[3] They operate within formal languages, such as those of first-order predicate logic, using inference rules like modus ponens or universal generalization to build derivations step by step.[4] The axiomatic method underlying formal proofs traces back to ancient Greek mathematics, particularly Euclid's Elements (c. 300 BCE), which organized geometry through postulates and logical deductions, though without modern symbolic notation.[5] The modern notion of formal proof emerged in the late 19th century amid efforts to rigorize mathematics and logic. Gottlob Frege's Begriffsschrift (1879) introduced the first complete formal system for predicate logic, using a two-dimensional notation to represent inferences and quantify over variables, enabling precise proofs of arithmetic truths.[6] This was extended by Giuseppe Peano's axiomatization of arithmetic (1889)[7] and further developed in the early 20th century by Bertrand Russell and Alfred North Whitehead in Principia Mathematica (1910–1913), which aimed to derive all mathematics from logical axioms,[8] and by David Hilbert's program for metamathematics (1920s).[5] These advancements formalized proof as an explicit syntactic process, contrasting with informal proofs that rely on natural language, intuition, and unstated assumptions.[9] In contemporary applications, formal proofs underpin automated theorem proving, where computer programs generate and verify derivations in systems like natural deduction or resolution, aiding software verification and mathematical formalization projects such as the Flyspeck theorem.[10][11] Recent advances include AI systems like AlphaProof (2024), which has solved International Mathematical Olympiad problems at a silver medal level.[12] However, Kurt Gödel's incompleteness theorems (1931) revealed fundamental limitations: in any consistent formal system capable of expressing basic arithmetic, there exist true statements that cannot be proved within the system itself.Prerequisites
Formal languages
A formal language serves as the syntactic foundation for formal proofs in mathematical logic, comprising an alphabet—a finite set of symbols—and a set of well-formed formulas (wffs), which are finite strings generated recursively from that alphabet according to specified formation rules.[13] This structure ensures that only valid expressions are considered in proofs, focusing exclusively on their form without regard to interpretation.[13] The key distinction in formal languages lies between syntax, which governs the structural arrangement of symbols to produce wffs, and semantics, which assigns meaning or truth values to those structures; for proofs, syntax is paramount as it defines the permissible manipulations independent of any external meaning.[13] Formal grammars offer tools to specify the recursive rules for generating wffs within this framework.[14] Examples of alphabets vary by logical system. In propositional logic, the alphabet typically includes propositional variables such as p, q, r, and logical connectives like \neg (negation), \land (conjunction), \lor (disjunction), \to (implication), and \leftrightarrow (biconditional).[15] In first-order logic, the alphabet expands to include individual variables (e.g., x, y), constant symbols (e.g., a, b), predicate symbols (e.g., P(x) for unary predicates, R(x,y) for binary relations), function symbols (e.g., f(x)), and quantifiers \forall (universal) and \exists (existential), alongside the propositional connectives.[16] The set of wffs is defined recursively to build expressions systematically. In propositional logic, the base cases are the atomic formulas, consisting of single propositional variables (e.g., p); compound wffs are then formed by applying connectives to existing wffs, such as (\alpha \land \beta) or \neg \alpha, where \alpha and \beta are wffs.[17] Similarly, in first-order logic, atomic formulas include predicate applications to terms (e.g., P(x) or R(x, f(y))); compound wffs arise from connectives applied to wffs (e.g., (\alpha \land \beta)) or quantifiers binding variables in wffs (e.g., \forall x \, \alpha or \exists y \, \beta).[18] This recursive construction guarantees that every wff is uniquely parsable and adheres strictly to the language's syntax.[14]Formal grammars
A formal grammar is a mathematical structure consisting of a finite set of production rules that specify how to generate strings from a given alphabet, thereby defining the syntax of a formal language. It is typically represented as a four-tuple G = (V, \Sigma, P, S), where V is a finite set of nonterminal symbols (variables that can be replaced), \Sigma is a finite set of terminal symbols (the actual alphabet elements), P is a finite set of production rules of the form \alpha \to \beta with \alpha, \beta \in (V \cup \Sigma)^* and \alpha containing at least one nonterminal, and S \in V is the start symbol from which derivations begin.[19] The language generated by G, denoted L(G), comprises all terminal strings derivable from S by repeated application of the rules in P.[19] The Chomsky hierarchy provides a classification of formal grammars based on the restrictions imposed on production rules, ordering them by decreasing generative power and increasing ease of recognition. Type-0 grammars (unrestricted) allow arbitrary rules \alpha \to \beta with \alpha non-empty; Type-1 (context-sensitive) require | \alpha | \leq | \beta | and the form \alpha A \gamma \to \alpha \beta \gamma; Type-2 (context-free) limit rules to A \to \beta where A is a single nonterminal; and Type-3 (regular) further restrict to right-linear or left-linear forms like A \to aB or A \to a. This hierarchy, introduced by Noam Chomsky, highlights trade-offs between expressive capacity and computational tractability in language recognition.[20][19] In the context of formal proofs, formal grammars play a crucial role in defining the valid syntactic structures, particularly well-formed formulas (wffs), ensuring that expressions are properly constructed before logical inference can proceed. Context-free grammars (CFGs) are especially prominent for this purpose, as their rules allow unrestricted replacement of nonterminals, enabling the modeling of recursive and nested structures common in logical expressions, such as parenthesized compounds, while remaining efficiently parsable in polynomial time.[21] For instance, a CFG for propositional logic might use terminals like propositional variables (e.g., p, q), connectives (\neg, \land, \lor), and parentheses, with nonterminals like S (for sentences) to generate only syntactically valid formulas.[21] A representative example uses Backus-Naur Form (BNF), a notation for specifying CFGs, to describe propositional logic syntax:This grammar generates wffs like (P \land Q) or \neg (P \lor R), but rejects malformed strings such as P \land or (\neg Q.[22] Parsing with such grammars involves determining whether a given string belongs to the language L(G), which in proof systems verifies syntactic validity to prevent errors in subsequent deductive steps. Algorithms like the Cocke-Younger-Kasami (CYK) parser, which runs in O(n^3) time for CFGs in Chomsky normal form, systematically build a table of possible derivations to recognize valid expressions and identify invalid ones, such as unbalanced parentheses or misplaced operators.[23] This recognition step ensures that only well-formed inputs are accepted for proof construction, maintaining the integrity of formal reasoning.[23]<sentence> ::= <atomic> | ¬ <sentence> | ( <sentence> ∧ <sentence> ) | ( <sentence> ∨ <sentence> ) <atomic> ::= True | False | <symbol> <symbol> ::= P | Q | R | ...<sentence> ::= <atomic> | ¬ <sentence> | ( <sentence> ∧ <sentence> ) | ( <sentence> ∨ <sentence> ) <atomic> ::= True | False | <symbol> <symbol> ::= P | Q | R | ...
Formal systems
A formal system is an abstract framework in mathematical logic defined as a tuple (L, A, R), where L is a formal language specifying the syntax of expressions, A is a set of axioms serving as initial assumptions, and R is a set of inference rules for deriving new expressions from existing ones.[24] This structure enables the systematic generation of theorems through deductive processes, forming the foundation for rigorous proof construction.[25] Various types of formal systems exist, differing in how they balance axioms and inference rules. Hilbert-style systems rely heavily on a large collection of axiom schemas with minimal rules, typically limited to modus ponens, making them concise for metatheoretic analysis.[26] In contrast, natural deduction systems prioritize a rich set of inference rules that mirror intuitive reasoning patterns, such as introduction and elimination rules for logical connectives, with axioms playing a subordinate role.[27] Sequent calculus systems, another variant, organize deductions around sequents—structures representing implications between sets of formulas—and emphasize structural rules for managing proof branches.[28] Unlike informal systems, which depend on natural language and human intuition prone to ambiguity, formal systems ensure mechanical verifiability, where every step in a derivation can be checked algorithmically without interpretive discretion.[29] Formal languages and grammars provide the syntactic base for defining well-formed expressions in L.[25] As an example, consider a basic formal system for propositional logic. The language L includes propositional variables (e.g., p, q) and connectives such as negation (\neg) and implication (\to), with well-formed formulas built recursively (e.g., p \to (q \to p)). The axioms A consist of schemas like (p \to (q \to p)) and ((p \to q) \to ((p \to \neg q) \to \neg p)), while the rules R include modus ponens: from \phi and \phi \to \psi, derive \psi. This setup allows deriving theorems such as the law of excluded middle, \neg p \to (p \to \psi).[30]Core Elements
Axioms
In formal systems, axioms are the foundational statements assumed to be true without requiring proof, providing the initial premises upon which all logical derivations are built.[13] These axioms must exhibit certain properties to ensure the robustness of the system: consistency, which guarantees that no contradictions can be derived from them, and independence, meaning no single axiom can be logically deduced from the others, preventing redundancy.[31] In propositional logic, standard examples from Hilbert-style systems include the axiomsp \to (q \to p)
and
p \to (q \to (p \land q)),
which capture basic implications and conjunction behaviors.[31] For first-order logic, axioms governing quantifiers include schemas such as
\forall x \, \phi(x) \to \phi(t),
where t is a term substitutable for x in \phi, allowing instantiation of universal statements.[13] Axioms play a crucial role in proof generation by serving as the unproven starting points that initiate chains of derivations, enabling the construction of theorems through subsequent applications of inference rules within the formal system.[31]
Inference rules
Inference rules are formal schemas that specify how new statements, or conclusions, can be validly derived from existing statements, known as premises, within a formal system. These rules provide the transformative mechanisms essential for constructing proofs, ensuring that derivations proceed step-by-step from given assumptions or axioms. In logical frameworks such as natural deduction or sequent calculus, inference rules are typically presented as conditional patterns that map a set of premises to a conclusion, preserving the logical structure of the language.[32][13] Inference rules are categorized into several types based on their function in manipulating logical connectives and quantifiers. Introduction rules allow the formation of complex formulas by adding connectives to simpler ones; for example, the conjunction introduction rule (\land-introduction) permits inferring A \land B from the premises A and B, as schematized below: \frac{A \quad B}{A \land B} \quad (\land\text{-I}) Elimination rules, conversely, extract components from compound formulas; the disjunction elimination rule (\lor-elimination), for instance, derives a conclusion C from A \lor B, A \to C, and B \to C. Structural rules manage the overall proof structure without altering the logical content, such as the weakening rule, which adds irrelevant premises to a derivation without affecting its validity, for example, from \Gamma \vdash \phi inferring \Gamma, \psi \vdash \phi. These categories, originating in Gerhard Gentzen's foundational work on natural deduction systems, enable systematic proof construction across various logical systems.[32][13] A key property of inference rules is their preservation of soundness, meaning that if the premises are true under a given interpretation, the conclusion must also be true. This truth-preserving nature ensures that no rule introduces falsehoods, thereby maintaining the reliability of derivations from axioms, which serve as the initial, unproven truths fed into these rules. Soundness is established through semantic analysis, where each rule is verified to hold across all models of the formal system.[13][32] A paradigmatic example is the modus ponens rule, also known as implication elimination (\to-elimination), which derives B from the premises A and A \to B. Its schema is: \frac{A \quad A \to B}{B} \quad (\to\text{-E}) In a simple derivation, suppose the premises are P (it is raining) and P \to Q (if it is raining, then the ground is wet). Applying modus ponens yields Q (the ground is wet) as the conclusion, demonstrating how the rule transforms an implication and its antecedent into the consequent. This rule, central to classical and intuitionistic logics, exemplifies the deductive power of inference rules in formal proofs.[13][32]Proof sequences
In formal logic, a proof is a finite sequence of well-formed formulas, where each formula is either an axiom of the system, an assumption, or derived from preceding formulas in the sequence via an application of an inference rule.[33] This structure ensures that every step builds systematically upon prior ones, providing a rigorous justification for the conclusion. Axioms and inference rules serve as the foundational building blocks for constructing such sequences.[33] For a proof to be valid, every non-axiom or non-assumption step must explicitly cite its justification, including the specific inference rule applied and the line numbers of the premises used—for instance, "by modus ponens from lines 1 and 3." This requirement maintains transparency and verifiability throughout the derivation. The final formula in the sequence constitutes the proven theorem, demonstrating that it logically follows from the given axioms and rules.[33] A representative example is the proof of the implication (p \land q) \to p in propositional logic using natural deduction, a proof system introduced by Gerhard Gentzen in 1934–35.[34] The sequence proceeds as follows:- p \land q Assumption
- p From line 1 by \land-elimination (conjunction elimination)
--- (discharge assumption) - (p \land q) \to p From lines 1–2 by \to-introduction (implication introduction)
Properties and Verification
Soundness
In formal logic, a proof system is sound if every statement provable within it is semantically valid, meaning that if a formula φ is provable from a set of premises Γ (denoted Γ ⊢ φ), then φ holds true in every interpretation satisfying Γ (denoted Γ ⊨ φ).[13] This property establishes a fundamental link between syntactic provability and semantic truth, ensuring that the system's derivations align with logical validity.[35] The proof of soundness for a formal system is established by mathematical induction on the structure or length of the proof sequence. In the base case, the axioms of the system are verified to be semantically valid under all relevant interpretations. For the inductive step, each inference rule is shown to preserve semantic validity: if the premises of the rule are true in an interpretation, then the conclusion must also be true in that interpretation.[36] This inductive argument confirms that no invalid formula can be derived.[37] Soundness has critical implications for formal systems, as it guarantees that no falsehoods can be derived from true premises, thereby underpinning the reliability of deductions in mathematical and computational reasoning.[38] Without soundness, a system could produce contradictory or erroneous results, undermining its utility in verifying arguments. A concrete example is the soundness theorem for classical propositional logic, which states that if a set of propositional formulas Σ is provable to entail a formula α (Σ ⊢ α), then α is a logical consequence of Σ under all truth valuations (Σ ⊨ α). The proof relies on structural induction over the proof derivation. The base case covers assumptions (if α is in Σ and Σ is true under a valuation, then α is true). Inductive cases examine the last inference rule applied; for instance, under the conjunction introduction rule (from Σ ⊢ β and Σ ⊢ γ, infer Σ ⊢ β ∧ γ), if a valuation satisfies Σ, it satisfies β and γ, hence β ∧ γ. Similarly, for modus ponens (from Σ ⊢ β → γ and Σ ⊢ β, infer Σ ⊢ γ), truth of the premises implies truth of γ. This outlines how rules preserve validity without detailing every case.[36]Completeness
In formal systems, completeness refers to the property that every semantically valid statement is provable, formally expressed as: if a formula \phi is true in all models (\models \phi), then \phi is syntactically derivable from the axioms (\vdash \phi). This ensures that the system's proof mechanism captures all logical truths without omission. Kurt Gödel established this property for first-order logic in his 1929 doctoral dissertation, proving that every valid formula in classical first-order predicate calculus possesses a formal proof within the standard axiomatic system.[39] The theorem resolves a longstanding question in Hilbert's program by confirming that first-order logic is semantically complete, meaning no valid inferences are missed by the deductive apparatus. Completeness complements soundness, the property that all provable statements are semantically valid. However, this completeness does not extend to richer theories; Gödel's first incompleteness theorem of 1931 demonstrates that any consistent formal system powerful enough to formalize basic arithmetic, such as Peano arithmetic, must be incomplete, as it contains true statements unprovable within the system. A standard proof of Gödel's completeness theorem proceeds by contraposition: to show that if \not\vdash \phi, then \not\models \phi (i.e., there exists a model falsifying \phi). This relies on Lindenbaum's lemma, which guarantees that any consistent set of formulas can be extended to a maximal consistent set \Gamma, where for every formula \psi, exactly one of \psi or \neg\psi belongs to \Gamma.[40] From \Gamma, a model is constructed by defining an interpretation where the domain consists of equivalence classes of terms under the relation induced by \Gamma, predicates and functions are interpreted to satisfy the formulas in \Gamma, and \phi \notin \Gamma ensures the model falsifies \phi. This Henkin-style construction, refining Gödel's original approach, verifies the theorem for countable languages.Interpretations
In formal logic, an interpretation provides a semantic framework that assigns meaning to the symbols and formulas of a formal language, thereby linking the syntactic structure to truth conditions. Formally, an interpretation consists of a structure comprising a nonempty domain D (a set of objects) and an interpretation function I that maps non-logical symbols—such as predicate symbols—to relations or functions on D, and constant symbols to elements of D. For propositional logic, interpretations are simpler, often defined as assignments of truth values (true or false) to atomic propositions. A valuation v, derived from the interpretation and an assignment of values to variables, extends this to complex formulas inductively: for instance, v(\neg \phi) = \top if v(\phi) = \bot, and v(\phi \land \psi) = \top if both v(\phi) = \top and v(\psi) = \top. This process determines whether a formula holds true in the given interpretation.[41] A formula is valid if it is true under every possible interpretation, meaning it receives the value true for all admissible structures and valuations. In first-order logic, a sentence \phi (with no free variables) is valid, denoted \models \phi, if for every model M with domain D and interpretation I, M \models \phi. Similarly, in propositional logic, validity corresponds to the formula being a tautology, true regardless of the truth assignments to its atomic components. This notion of validity captures universal semantic truth, independent of specific domains or assignments.[41] Interpretations play a crucial role in evaluating proofs by distinguishing semantic entailment from syntactic deduction. Semantic entailment, denoted \Gamma \models \phi, holds if every interpretation that makes all formulas in a set \Gamma true also makes \phi true; that is, there is no interpretation satisfying \Gamma while falsifying \phi. In contrast, syntactic deduction \Gamma \vdash \phi means \phi can be derived from \Gamma using the formal system's inference rules. Soundness and completeness theorems establish that these notions coincide for certain logics, ensuring proofs align with semantic consequences.[41] A representative example in propositional logic is the law of excluded middle, p \lor \neg p, which is valid as a tautology. Under the truth table interpretation, where p can be true or false, the formula evaluates to true in both cases:| p | \neg p | p \lor \neg p |
|---|---|---|
| \top | \bot | \top |
| \bot | \top | \top |