Green–Tao theorem
The Green–Tao theorem states that the sequence of prime numbers contains arithmetic progressions of any given finite length. Specifically, for every positive integer k, there exist infinitely many k-term arithmetic progressions consisting entirely of primes. Proved by mathematicians Ben Green and Terence Tao in 2004 and published in 2008, the theorem resolves a longstanding conjecture in number theory dating back to at least the 18th century.[1][2] The theorem builds on earlier results in additive combinatorics and analytic number theory. In 1975, Endre Szemerédi proved that any subset of the natural numbers with positive upper density contains arithmetic progressions of arbitrary length, but the primes have asymptotic density zero, making direct application impossible.[1] Prior progress on primes in arithmetic progressions included van der Corput's 1939 proof for length 3 and computational discoveries of longer sequences, such as a 10-term progression found in the 1980s.[1] Green and Tao's breakthrough adapts Szemerédi's theorem to the primes using a "transference principle" that translates results from dense sets to pseudorandom subsets with positive relative density.[1] The proof involves several innovative techniques, including the construction of a pseudorandom measure that majorizes the von Mangoldt function (which encodes primes) and the use of ergodic theory to handle uniformity norms.[1] It also relies on work by Goldston and Yıldırım showing that primes can be sieved into sets with positive relative density in short intervals.[1] A key innovation is the "W-trick," which modifies the von Mangoldt function to avoid obstructions from small primes while preserving essential properties.[1] The Green–Tao theorem has had profound impact, earning Green and Tao the Fields Medal in 2006 (for Tao) and inspiring extensions, such as quantitative bounds on the length of progressions and applications to other sparse sets like primes in polynomial progressions.[3] It has over 1,200 citations[3] and spurred computational searches yielding progressions as long as 27 terms (as of 2023). The result bridges additive combinatorics and sieve theory, influencing research in ergodic Ramsey theory and the distribution of primes.[1]Background and History
Early Conjectures
In the 18th century, mathematicians began exploring patterns in prime numbers, including their occurrence in arithmetic progressions. Examples such as the three-term arithmetic progression of primes 5, 11, 17 were noted, suggesting potential for longer such sequences.[4] Shortly thereafter, Joseph-Louis Lagrange and Edward Waring, also in 1770, investigated constraints on these progressions, observing that for a sequence of length k, the common difference must be divisible by all primes less than k to avoid residues modulo those primes that would force compositeness.[1] These early observations highlighted the sparsity of primes yet suggested potential for structured distributions. Lagrange extended related ideas in 1775 through his conjecture that every odd integer greater than 5 can be expressed as the sum of a prime and twice another prime, implicitly linking additive properties of primes to progressions by considering how primes could align in linear forms.[5] This work built on Goldbach's earlier ideas but emphasized prime representations that foreshadowed density questions in arithmetic settings. In 1837, Peter Gustav Lejeune Dirichlet proved a foundational result: if a and d are coprime positive integers, then the arithmetic progression a, a+d, a+2d, \dots contains infinitely many primes, providing the first rigorous confirmation of unbounded primes in specific progressions with fixed difference.[6] By the early 20th century, G. H. Hardy and J. E. Littlewood formalized conjectures on the density of primes in arithmetic progressions. In their 1923 k-tuple conjecture, they proposed asymptotic formulas for the number of prime constellations, including progressions of any fixed length k, predicting that such sequences occur with positive density proportional to a singular series accounting for local obstructions modulo primes.[7] This conjecture implied infinitely many k-term arithmetic progressions of primes for every k, extending earlier speculations. A significant advance came in 1939 when J. W. S. van der Corput proved that there are infinitely many three-term arithmetic progressions among the primes, using exponential sums to establish non-vanishing of certain integrals related to the distribution of primes.[1] These developments laid the groundwork for later results, such as Szemerédi's theorem on arithmetic progressions in dense sets, which would eventually be adapted to the primes.[1]Pre-Green-Tao Results
In 1975, Endre Szemerédi established a foundational result in additive combinatorics, proving that any subset of the natural numbers with positive upper density contains arithmetic progressions of arbitrary finite length. His proof relied on the regularity lemma from graph theory, which partitions large graphs into equitable subsets to control combinatorial structures. This theorem provided a powerful tool for dense sets but could not be directly applied to the primes, as the set of prime numbers has asymptotic density zero. Building on earlier work, such as van der Corput's 1939 demonstration of infinitely many 3-term arithmetic progressions among primes, researchers turned to sieve methods and almost-primes to make progress toward longer progressions. In 1981, D. R. Heath-Brown proved the existence of infinitely many 4-term arithmetic progressions consisting of three primes and one almost-prime (a number with at most two prime factors), with the common difference bounded explicitly by a function involving the size of the progression. This result extended partial knowledge beyond pure 3-term progressions by incorporating sieve techniques to handle the sparsity of primes. A related 1978 paper by Heath-Brown further explored almost-primes in arithmetic progressions, providing explicit asymptotic bounds for their distribution in short intervals and progressions, which strengthened estimates for 3-term configurations with bounded prime factors.[8][9] In 1990, Antal Balog employed advanced sieve methods to address the prime k-tuplets conjecture on average, constructing infinitely many 3-term arithmetic progressions where the terms are primes or closely related forms, with improved quantitative estimates on their frequency compared to prior analytic approaches. This work highlighted the efficacy of sieve theory in capturing average behaviors over residue classes, paving the way for handling pseudorandom majorants of the primes.[10] Concurrently, in a 2004 preprint (published 2005), John Friedlander, Henryk Iwaniec, and others built on these ideas, but a key contribution came from Daniel Goldston, Cem Yalçın Yıldırım, and János Pintz, who developed the GPY sieve to analyze pseudorandom subsets of the primes. Their method demonstrated that there are infinitely many pairs of primes differing by at most some bounded multiple of the logarithm, and extended to short arithmetic progressions of length 3 with common differences o(p log p) for primes p around that scale, revealing structured correlations within the primes despite their zero density. These pre-2004 efforts underscored the limitations of density-based theorems and the need for transference principles to bridge combinatorial and analytic number theory.The Theorem
Formal Statement
The Green–Tao theorem states that the sequence of prime numbers contains arithmetic progressions of arbitrary finite length. Formally, for every positive integer k \geq 1, there exist infinitely many k-term arithmetic progressions consisting entirely of primes, that is, integers a and d > 0 such that a, a+d, \dots, a+(k-1)d are all prime.[1] Although the primes have asymptotic density zero in the natural numbers—that is, \lim_{N \to \infty} \frac{\pi(N)}{N} = 0, where \pi(N) denotes the number of primes up to N—the theorem establishes that subsets of primes with positive relative upper density contain such progressions. Specifically, a subset A \subseteq \mathbb{P} (where \mathbb{P} is the set of primes) has positive relative upper density if \bar{\delta}(A) = \limsup_{N \to \infty} \frac{|A \cap [1,N]|}{\pi(N)} > 0. Any such A contains infinitely many k-term arithmetic progressions for every k \geq 1.[1] Quantitatively, the theorem provides an asymptotic lower bound on the number of k-term arithmetic progressions of primes up to N. Let r_{k,\mathbb{P}}(N) denote the number of triples (a,d,m) with $1 \leq m \leq N, d > 0, and a+(k-1)d \leq N such that a+jd is prime for each $0 \leq j < k. Then, r_{k,\mathbb{P}}(N) \gg_k \frac{N^2}{(\log N)^k}, where the implied constant depends only on k. This count is asymptotically comparable to \mathfrak{S}(k) \frac{N^2}{(\log N)^k} for a positive singular series \mathfrak{S}(k). For a subset A \subseteq \mathbb{P} with \bar{\delta}(A) \geq \delta > 0, r_{k,A}(N) \gg_{k,\delta} \frac{N^2}{(\log N)^k}. [1]Significance
The Green–Tao theorem represents a landmark achievement in number theory by proving that the primes contain arithmetic progressions of arbitrary finite length, thereby resolving a longstanding special case of the Erdős–Turán conjecture for the set of prime numbers, which posits that any subset of the positive integers with divergent reciprocal sum must contain such progressions.[1] Despite the primes having zero natural density, their divergent harmonic series places them within the scope of this conjecture, making the theorem a striking confirmation of structural richness in sparse sets. The theorem's proof ingeniously bridges additive combinatorics and analytic number theory, adapting Szemerédi's density theorem on arithmetic progressions in dense sets to the pseudorandom setting of the primes via relative Szemerédi theorems and Gowers uniformity norms, while incorporating sieve-theoretic tools to handle the primes' multiplicative constraints.[1] This interdisciplinary fusion has established new paradigms for studying additive patterns in number-theoretic sets. In terms of implications, the result bolsters heuristics on prime distribution, such as Cramér's probabilistic model, which anticipates long arithmetic progressions in the primes based on their expected random-like spacing, thereby providing concrete evidence for the model's validity in capturing additive structures.[11] The theorem's methods have also catalyzed major advances, including Yitang Zhang's 2013 demonstration of infinitely many bounded gaps between primes and James Maynard's subsequent refinements achieving gaps as small as 12 under the Elliott–Halberstam conjecture.[12][13] Philosophically, the Green–Tao theorem underscores the primes' pseudorandom nature, revealing substantial additive order amid their sparsity and density-zero limitations, and challenging the intuition that such thin sets lack predictable patterns.[11]Proof Overview
Key Components
The proof of the Green–Tao theorem relies on applying Szemerédi's theorem, which guarantees the existence of arbitrarily long arithmetic progressions in subsets of the integers with positive upper density, to a carefully constructed pseudorandom subset that contains a positive proportion of primes.[1] This subset is designed to mimic the behavior of the full set of integers in terms of arithmetic structure while incorporating the distribution of primes, allowing the transference of progression results from dense sets to the sparser primes.[1] Central to this approach is a decomposition of the primes, represented via the von Mangoldt function, into a "structured" component of low complexity—capturing predictable patterns—and a "pseudorandom" component exhibiting high entropy, which behaves like a random set with respect to arithmetic progressions.[1] The structured part is handled through combinatorial methods that identify potential progressions, while the pseudorandom part requires more advanced tools to ensure it does not disrupt the presence of long progressions. To manage this pseudorandom component, the proof employs techniques inspired by ergodic theory, which provide a framework for analyzing measure-preserving transformations and averaging over multiple scales to detect hidden structures in seemingly random data.[1] The overall argument proceeds iteratively: starting from an initial dense set containing many primes, the proof identifies long arithmetic progressions within it and then transfers this density to a refined subset via a density increment strategy, gradually increasing the relative density of primes in progressions until a full arithmetic progression of primes is isolated.[1] This process terminates after a bounded number of steps, ensuring the existence of progressions of any desired length. A key building block in constructing the initial candidate sets with many primes is the GPY method, developed by Goldston, Pintz, and Yıldırım, which produces pseudorandom sets of almost-primes with positive density.[1]Core Techniques
The proof of the Green–Tao theorem relies on several innovative techniques from additive combinatorics to establish the existence of arbitrarily long arithmetic progressions in the primes. Central among these is the transference principle, also known as the dense model theorem, which enables the mapping of structural properties—such as the presence of arithmetic progressions—from dense subsets of the integers to sparser, pseudorandom subsets like the primes, while preserving the arithmetic progression structure. Specifically, this principle states that if a function f: \mathbb{Z}/N\mathbb{Z} \to [0, \infty) is bounded by a k-pseudorandom measure \nu (meaning \mathbb{E} f \geq \delta > 0), then the expected density of k-term arithmetic progressions in f is at least c(k, \delta) - o_{k,\delta}(1), where c(k, \delta) is a positive constant depending only on k and \delta. This allows results like Szemerédi's theorem, which guarantees arithmetic progressions in dense sets, to be "transferred" to the primes by modeling them within pseudorandom environments that mimic random behavior additively.[1][14] Complementing this is the relative Szemerédi theorem, which adapts Szemerédi's classical result to subsets with positive relative density within a pseudorandom host set, rather than absolute density in the integers. In the context of the Green–Tao proof, it asserts that any subset A of a pseudorandom host set H (in the sense defined in the proof) with positive relative upper density in H contains infinitely many k-term arithmetic progressions for any fixed k. The primes are embedded as a positive relative density subset within such an H constructed via a pseudorandom majorant of the von Mangoldt function. This relative version is crucial because the primes have zero natural density in the integers, so absolute density arguments fail; instead, it leverages the pseudorandom structure of the host set to ensure progressions appear with positive relative frequency. The theorem is established by combining the transference principle with uniformity controls on the host set.[1][15][16] To quantify pseudorandomness and detect sets free of arithmetic progressions, the proof employs Gowers uniformity norms, a family of norms that measure the deviation of a function from being uniformly distributed with respect to higher-order additive structure. For a function f: \mathbb{Z}/N\mathbb{Z} \to \mathbb{C}, the U^{d+1}-norm is defined as \|f\|_{U^{d+1}} = \left( \mathbb{E}_{x, h_1, \dots, h_{d+1} \in \mathbb{Z}/N\mathbb{Z}} \prod_{\omega \in \{0,1\}^{d+1}} \mathcal{C}^{|\omega|} f(x + \omega \cdot \mathbf{h}) \right)^{1/2^{d+1}}, where \mathcal{C} z = \overline{z} is complex conjugation, |\omega| is the number of 1's in the binary vector \omega, and \mathbf{h} = (h_1, \dots, h_{d+1}). A set or measure \nu is k-pseudorandom if \|\nu - 1\|_{U^d} = o(1) for all d \leq k, ensuring it behaves like a random set in controlling correlations along arithmetic progressions or more general linear configurations. These norms are pivotal for identifying arithmetic progression-free sets, as small U^{k+1}-norms imply the absence of structured biases that could prevent progressions. In the Green–Tao framework, the primes are embedded into such pseudorandom sets via the von Mangoldt function \Lambda(n), which indicators primes (and prime powers), allowing the norms to certify the primes' suitability as a host for dense subsets containing progressions.[1][14][15] The Gowers norms directly underpin the uniformity equation, which bounds the expectation of products over linear forms corresponding to arithmetic progressions. For functions f_0, \dots, f_{k-1}: \mathbb{Z}/N\mathbb{Z} \to \mathbb{R} bounded by a k-pseudorandom measure, the expected density of a k-term progression is controlled as \left| \mathbb{E}_{x,r \in \mathbb{Z}/N\mathbb{Z}} \prod_{j=0}^{k-1} f_j(x + j r) \right| \leq \min_{0 \leq j \leq k-1} \|f_j\|_{U^k} + o(1), with the U^{k+1}-norm providing finer control for higher-order uniformity in the inverse theorems underlying the proof (e.g., decomposing non-uniform functions into structured plus uniform parts). This equation ensures that if the functions are sufficiently uniform (small U^{k+1}-norm), the progression count is close to what random behavior predicts, while large norms reveal structured components where progressions can be found via the relative Szemerédi theorem. The bound is derived from multilinearity and Cauchy-Schwarz inequalities over the derivative polynomials defining the norms.[1][15][14] Finally, adaptations of the sieve of Eratosthenes are used to extract primes (or almost-primes) from pseudorandom sets while preserving uniformity. Rather than the classical sieve, which eliminates composites by multiples of small primes, the proof employs a truncated von Mangoldt sieve \Lambda_R(n) = \sum_{d \mid n, d \leq R} \mu(d) \log(R/d), where R is chosen to balance sieve level and uniformity (typically R \approx N^{o(1)}). This creates a pseudorandom measure \nu(n) \approx \phi(W) W \Lambda_R(W n + 1) / (2 \log R) for suitable W (a large primorial to avoid small prime factors), majorizing a positive proportion of the primes via the "W-trick," which shifts arguments to residue classes modulo W to mitigate sieve obstructions. The resulting almost-prime set P_R(N) has relative density \gg 1 / \log R in [N] and inherits the pseudorandomness of the host set, allowing transference to apply. This sieving preserves Gowers norms up to o(1) errors, ensuring the extracted primes support the relative density required for progressions.[1][15][14]Numerical Verification
Known Progressions
The discovery of long arithmetic progressions of primes has advanced through dedicated computational searches, providing concrete verifications of the Green–Tao theorem's implications. An early milestone was the 22-term progression identified by Paul A. Pritchard et al. in 1993.[17] This was surpassed by a 23-term progression found by Markus Frind, Paul Jobling, and Paul Underwood in 2004.[17] In 2007, Jaroslaw Wroblewski extended the record to 24 terms.[17] Further progress came in 2010 with a 26-term progression discovered by Benoît Perichon et al. with PrimeGrid.[17] This was improved to 27 terms in 2019 by Rob Gahan in collaboration with the PrimeGrid project.[18] As of November 2025, the longest known arithmetic progression of primes remains this 27-term sequence, though 2025 saw new records for shorter lengths with much larger primes, such as a 26-term progression by James Jayaputera in September 2025.[17] Notable 2025 discoveries by Norman Luhn include a 17-term progression starting at 11,691,689,348 with common difference 319,091,802 (primes with ~47 digits), and a 16-term progression starting at 50,070,689,738 with common difference 330,062,135 (primes with ~50 digits), both constructed using multiples of primorials to ensure coprimality to small primes.[17] These progressions consist of extremely large primes, often with dozens or hundreds of digits, verified using probabilistic primality tests such as the Miller-Rabin algorithm, which provides high confidence in primality for practical purposes.| Length (k) | Discoverer(s) | Year | Starting Term | Common Difference |
|---|---|---|---|---|
| 22 | Paul A. Pritchard et al. | 1993 | 11,410,337,850,553 | 4,609,098,694,200 |
| 23 | Markus Frind, Paul Jobling, Paul Underwood | 2004 | 25,196,400,000,010,222,840 + 26,342,379,000,000,000 n | 223,092,870 |
| 24 | Jaroslaw Wroblewski | 2007 | (specific form not detailed in records) | (specific form not detailed in records) |
| 26 | Benoît Perichon et al., PrimeGrid | 2010 | (specific form not detailed in records) | (specific form not detailed in records) |
| 27 | Rob Gahan, PrimeGrid | 2019 | 224,584,605,939,537,911 + 81,292,139 × 23# × n for n=0 to 26 | 81,292,139 × 23# |
| 16 | Norman Luhn | 2025 | 50,070,689,738 + 330,062,135 n | 101# + 1 (adjusted form) |
| 17 | Norman Luhn | 2025 | 11,691,689,348 + 319,091,802 n | 97# + 1 (adjusted form) |
| 26 | James Jayaputera | 2025 | 1,688,330,176 + 397,344,516 n | 23# + 138,131,087 |