Fact-checked by Grok 2 weeks ago

Green–Tao theorem

The Green–Tao theorem states that the sequence of prime numbers contains progressions of any given finite length. Specifically, for every positive integer k, there exist infinitely many k-term progressions consisting entirely of primes. Proved by mathematicians Ben Green and in 2004 and published in 2008, the theorem resolves a longstanding in dating back to at least the . The theorem builds on earlier results in additive combinatorics and . In 1975, proved that any subset of the natural numbers with positive upper density contains arithmetic progressions of arbitrary length, but the primes have asymptotic density zero, making direct application impossible. Prior progress on primes in arithmetic progressions included van der Corput's 1939 proof for length 3 and computational discoveries of longer sequences, such as a 10-term progression found in the . Green and Tao's breakthrough adapts to the primes using a "transference principle" that translates results from dense sets to pseudorandom subsets with positive . The proof involves several innovative techniques, including the construction of a pseudorandom measure that majorizes the (which encodes primes) and the use of to handle uniformity norms. It also relies on work by Goldston and Yıldırım showing that primes can be sieved into sets with positive in short intervals. A key innovation is the "W-trick," which modifies the to avoid obstructions from small primes while preserving essential properties. The Green–Tao theorem has had profound impact, earning Green and Tao the in 2006 (for Tao) and inspiring extensions, such as quantitative bounds on the length of progressions and applications to other sparse sets like primes in polynomial progressions. It has over 1,200 citations and spurred computational searches yielding progressions as long as 27 terms (as of 2023). The result bridges additive combinatorics and , influencing research in ergodic and the distribution of primes.

Background and History

Early Conjectures

In the , mathematicians began exploring patterns in prime numbers, including their occurrence in . Examples such as the three-term of primes 5, 11, 17 were noted, suggesting potential for longer such sequences. Shortly thereafter, and Edward Waring, also in 1770, investigated constraints on these progressions, observing that for a sequence of length k, the common difference must be divisible by all primes less than k to avoid residues those primes that would force compositeness. These early observations highlighted the sparsity of primes yet suggested potential for structured distributions. Lagrange extended related ideas in 1775 through his conjecture that every odd integer greater than 5 can be expressed as the sum of a prime and twice another prime, implicitly linking additive properties of primes to progressions by considering how primes could align in linear forms. This work built on Goldbach's earlier ideas but emphasized prime representations that foreshadowed density questions in arithmetic settings. In 1837, proved a foundational result: if a and d are coprime positive integers, then the a, a+d, a+2d, \dots contains infinitely many primes, providing the first rigorous confirmation of unbounded primes in specific progressions with fixed difference. By the early 20th century, and J. E. Littlewood formalized conjectures on the density of primes in arithmetic progressions. In their 1923 k-tuple conjecture, they proposed asymptotic formulas for the number of prime constellations, including progressions of any fixed length k, predicting that such sequences occur with positive density proportional to a singular series accounting for local obstructions modulo primes. This conjecture implied infinitely many k-term arithmetic progressions of primes for every k, extending earlier speculations. A significant advance came in 1939 when J. W. S. van der Corput proved that there are infinitely many three-term arithmetic progressions among the primes, using exponential sums to establish non-vanishing of certain integrals related to the distribution of primes. These developments laid the groundwork for later results, such as on arithmetic progressions in dense sets, which would eventually be adapted to the primes.

Pre-Green-Tao Results

In 1975, established a foundational result in additive combinatorics, proving that any subset of the natural numbers with positive upper density contains arithmetic progressions of arbitrary finite length. His proof relied on the regularity lemma from , which partitions large graphs into equitable subsets to control combinatorial structures. This theorem provided a powerful tool for dense sets but could not be directly applied to the primes, as the set of prime numbers has asymptotic density zero. Building on earlier work, such as van der Corput's 1939 demonstration of infinitely many 3-term progressions among primes, researchers turned to methods and almost-primes to make progress toward longer progressions. In 1981, D. R. Heath-Brown proved the existence of infinitely many 4-term progressions consisting of three primes and one almost-prime (a number with at most two prime factors), with the common difference bounded explicitly by a involving the size of the progression. This result extended partial knowledge beyond pure 3-term progressions by incorporating techniques to handle the sparsity of primes. A related 1978 paper by Heath-Brown further explored almost-primes in progressions, providing explicit asymptotic bounds for their distribution in short intervals and progressions, which strengthened estimates for 3-term configurations with bounded prime factors. In 1990, Antal Balog employed advanced methods to address the prime k-tuplets on average, constructing infinitely many 3-term progressions where the terms are primes or closely related forms, with improved quantitative estimates on their frequency compared to prior analytic approaches. This work highlighted the efficacy of in capturing average behaviors over residue classes, paving the way for handling pseudorandom majorants of the primes. Concurrently, in a preprint (published 2005), John Friedlander, Henryk Iwaniec, and others built on these ideas, but a key contribution came from Daniel Goldston, Cem Yalçın Yıldırım, and János Pintz, who developed the GPY sieve to analyze pseudorandom subsets of the primes. Their method demonstrated that there are infinitely many pairs of primes differing by at most some bounded multiple of the logarithm, and extended to short progressions of length 3 with common differences o(p log p) for primes p around that scale, revealing structured correlations within the primes despite their zero . These pre-2004 efforts underscored the limitations of density-based theorems and the need for principles to bridge combinatorial and .

The Theorem

Formal Statement

The Green–Tao theorem states that the sequence of prime numbers contains arithmetic progressions of arbitrary finite length. Formally, for every positive k \geq 1, there exist infinitely many k-term arithmetic progressions consisting entirely of primes, that is, integers a and d > 0 such that a, a+d, \dots, a+(k-1)d are all prime. Although the primes have asymptotic density zero in the natural numbers—that is, \lim_{N \to \infty} \frac{\pi(N)}{N} = 0, where \pi(N) denotes the number of primes up to N—the theorem establishes that of primes with positive relative upper density contain such progressions. Specifically, a A \subseteq \mathbb{P} (where \mathbb{P} is the set of primes) has positive relative upper density if \bar{\delta}(A) = \limsup_{N \to \infty} \frac{|A \cap [1,N]|}{\pi(N)} > 0. Any such A contains infinitely many k-term arithmetic progressions for every k \geq 1. Quantitatively, the theorem provides an asymptotic lower bound on the number of k-term arithmetic progressions of primes up to N. Let r_{k,\mathbb{P}}(N) denote the number of triples (a,d,m) with $1 \leq m \leq N, d > 0, and a+(k-1)d \leq N such that a+jd is prime for each $0 \leq j < k. Then, r_{k,\mathbb{P}}(N) \gg_k \frac{N^2}{(\log N)^k}, where the implied constant depends only on k. This count is asymptotically comparable to \mathfrak{S}(k) \frac{N^2}{(\log N)^k} for a positive singular series \mathfrak{S}(k). For a subset A \subseteq \mathbb{P} with \bar{\delta}(A) \geq \delta > 0, r_{k,A}(N) \gg_{k,\delta} \frac{N^2}{(\log N)^k}.

Significance

The Green–Tao theorem represents a landmark achievement in number theory by proving that the primes contain arithmetic progressions of arbitrary finite length, thereby resolving a longstanding special case of the Erdős–Turán conjecture for the set of prime numbers, which posits that any subset of the positive integers with divergent reciprocal sum must contain such progressions. Despite the primes having zero natural density, their divergent harmonic series places them within the scope of this conjecture, making the theorem a striking confirmation of structural richness in sparse sets. The theorem's proof ingeniously bridges additive combinatorics and , adapting Szemerédi's density theorem on arithmetic progressions in dense sets to the pseudorandom setting of the primes via relative Szemerédi theorems and Gowers uniformity norms, while incorporating sieve-theoretic tools to handle the primes' multiplicative constraints. This interdisciplinary fusion has established new paradigms for studying additive patterns in number-theoretic sets. In terms of implications, the result bolsters heuristics on prime distribution, such as Cramér's probabilistic model, which anticipates long arithmetic progressions in the primes based on their expected random-like spacing, thereby providing concrete evidence for the model's validity in capturing additive structures. The theorem's methods have also catalyzed major advances, including Yitang Zhang's 2013 demonstration of infinitely many bounded gaps between primes and James Maynard's subsequent refinements achieving gaps as small as 12 under the . Philosophically, the Green–Tao theorem underscores the primes' pseudorandom nature, revealing substantial additive order amid their sparsity and density-zero limitations, and challenging the intuition that such thin sets lack predictable patterns.

Proof Overview

Key Components

The proof of the Green–Tao theorem relies on applying , which guarantees the existence of arbitrarily long arithmetic progressions in subsets of the integers with positive upper , to a carefully constructed pseudorandom that contains a positive proportion of primes. This is designed to mimic the of the full set of integers in terms of structure while incorporating the of primes, allowing the transference of progression results from dense sets to the sparser primes. Central to this approach is a of the primes, represented via the , into a "structured" component of low complexity—capturing predictable patterns—and a "pseudorandom" component exhibiting high , which behaves like a random set with respect to arithmetic progressions. The structured part is handled through combinatorial methods that identify potential progressions, while the pseudorandom part requires more advanced tools to ensure it does not disrupt the presence of long progressions. To manage this pseudorandom component, the proof employs techniques inspired by , which provide a framework for analyzing measure-preserving transformations and averaging over multiple scales to detect hidden structures in seemingly random data. The overall argument proceeds iteratively: starting from an initial containing many primes, the proof identifies long within it and then transfers this to a refined subset via a , gradually increasing the of primes in progressions until a full of primes is isolated. This process terminates after a bounded number of steps, ensuring the existence of progressions of any desired length. A key building block in constructing the initial candidate sets with many primes is the GPY method, developed by Goldston, Pintz, and Yıldırım, which produces pseudorandom sets of almost-primes with positive .

Core Techniques

The proof of the Green–Tao theorem relies on several innovative techniques from additive combinatorics to establish the existence of arbitrarily long arithmetic progressions in the primes. Central among these is the transference principle, also known as the dense model theorem, which enables the mapping of structural properties—such as the presence of arithmetic progressions—from dense subsets of the integers to sparser, pseudorandom subsets like the primes, while preserving the arithmetic progression structure. Specifically, this principle states that if a function f: \mathbb{Z}/N\mathbb{Z} \to [0, \infty) is bounded by a k-pseudorandom measure \nu (meaning \mathbb{E} f \geq \delta > 0), then the expected density of k-term arithmetic progressions in f is at least c(k, \delta) - o_{k,\delta}(1), where c(k, \delta) is a positive constant depending only on k and \delta. This allows results like Szemerédi's theorem, which guarantees arithmetic progressions in dense sets, to be "transferred" to the primes by modeling them within pseudorandom environments that mimic random behavior additively. Complementing this is the relative Szemerédi theorem, which adapts Szemerédi's classical result to subsets with positive relative density within a pseudorandom host set, rather than absolute density in the integers. In the context of the Green–Tao proof, it asserts that any subset A of a pseudorandom host set H (in the sense defined in the proof) with positive relative upper density in H contains infinitely many k-term arithmetic progressions for any fixed k. The primes are embedded as a positive relative density subset within such an H constructed via a pseudorandom majorant of the . This relative version is crucial because the primes have zero natural density in the integers, so absolute density arguments fail; instead, it leverages the pseudorandom structure of the host set to ensure progressions appear with positive relative frequency. The theorem is established by combining the transference with uniformity controls on the host set. To quantify pseudorandomness and detect sets free of arithmetic progressions, the proof employs Gowers uniformity norms, a family of norms that measure the deviation of a function from being uniformly distributed with respect to higher-order additive structure. For a function f: \mathbb{Z}/N\mathbb{Z} \to \mathbb{C}, the U^{d+1}-norm is defined as \|f\|_{U^{d+1}} = \left( \mathbb{E}_{x, h_1, \dots, h_{d+1} \in \mathbb{Z}/N\mathbb{Z}} \prod_{\omega \in \{0,1\}^{d+1}} \mathcal{C}^{|\omega|} f(x + \omega \cdot \mathbf{h}) \right)^{1/2^{d+1}}, where \mathcal{C} z = \overline{z} is complex conjugation, |\omega| is the number of 1's in the binary vector \omega, and \mathbf{h} = (h_1, \dots, h_{d+1}). A set or measure \nu is k-pseudorandom if \|\nu - 1\|_{U^d} = o(1) for all d \leq k, ensuring it behaves like a random set in controlling correlations along arithmetic progressions or more general linear configurations. These norms are pivotal for identifying arithmetic progression-free sets, as small U^{k+1}-norms imply the absence of structured biases that could prevent progressions. In the Green–Tao framework, the primes are embedded into such pseudorandom sets via the von Mangoldt function \Lambda(n), which indicators primes (and prime powers), allowing the norms to certify the primes' suitability as a host for dense subsets containing progressions. The Gowers norms directly underpin the , which bounds the expectation of products over linear forms corresponding to arithmetic progressions. For functions f_0, \dots, f_{k-1}: \mathbb{Z}/N\mathbb{Z} \to \mathbb{R} bounded by a k-pseudorandom measure, the expected density of a k-term progression is controlled as \left| \mathbb{E}_{x,r \in \mathbb{Z}/N\mathbb{Z}} \prod_{j=0}^{k-1} f_j(x + j r) \right| \leq \min_{0 \leq j \leq k-1} \|f_j\|_{U^k} + o(1), with the U^{k+1}-norm providing finer control for higher-order uniformity in the inverse theorems underlying the proof (e.g., decomposing non-uniform functions into structured plus uniform parts). This equation ensures that if the functions are sufficiently uniform (small U^{k+1}-norm), the progression count is close to what random behavior predicts, while large norms reveal structured components where progressions can be found via the relative Szemerédi theorem. The bound is derived from multilinearity and Cauchy-Schwarz inequalities over the derivative polynomials defining the norms. Finally, adaptations of the are used to extract primes (or almost-primes) from sets while preserving uniformity. Rather than the classical sieve, which eliminates composites by multiples of small primes, the proof employs a truncated von Mangoldt sieve \Lambda_R(n) = \sum_{d \mid n, d \leq R} \mu(d) \log(R/d), where R is chosen to balance sieve level and uniformity (typically R \approx N^{o(1)}). This creates a measure \nu(n) \approx \phi(W) W \Lambda_R(W n + 1) / (2 \log R) for suitable W (a large to avoid small prime factors), majorizing a positive proportion of the primes via the "W-trick," which shifts arguments to residue classes modulo W to mitigate sieve obstructions. The resulting almost-prime set P_R(N) has relative density \gg 1 / \log R in [N] and inherits the pseudorandomness of the host set, allowing to apply. This sieving preserves Gowers norms up to o(1) errors, ensuring the extracted primes support the relative density required for progressions.

Numerical Verification

Known Progressions

The discovery of long arithmetic progressions of primes has advanced through dedicated computational searches, providing concrete verifications of the Green–Tao theorem's implications. An early milestone was the 22-term progression identified by Paul A. Pritchard et al. in 1993. This was surpassed by a 23-term progression found by Markus Frind, Paul Jobling, and Paul Underwood in 2004. In 2007, Jaroslaw Wroblewski extended the record to 24 terms. Further progress came in 2010 with a 26-term progression discovered by Benoît Perichon et al. with PrimeGrid. This was improved to 27 terms in 2019 by Rob Gahan in collaboration with the PrimeGrid project. As of November 2025, the longest known arithmetic progression of primes remains this 27-term sequence, though 2025 saw new records for shorter lengths with much larger primes, such as a 26-term progression by James Jayaputera in September 2025. Notable 2025 discoveries by Norman Luhn include a 17-term progression starting at 11,691,689,348 with common difference 319,091,802 (primes with ~47 digits), and a 16-term progression starting at 50,070,689,738 with common difference 330,062,135 (primes with ~50 digits), both constructed using multiples of primorials to ensure coprimality to small primes. These progressions consist of extremely large primes, often with dozens or hundreds of digits, verified using probabilistic primality tests such as the , which provides high confidence in primality for practical purposes.
Length (k)Discoverer(s)YearStarting TermCommon Difference
22Paul A. Pritchard et al.199311,410,337,850,5534,609,098,694,200
23Markus Frind, Paul Jobling, Paul Underwood200425,196,400,000,010,222,840 + 26,342,379,000,000,000 n223,092,870
24Jaroslaw Wroblewski2007(specific form not detailed in records)(specific form not detailed in records)
26Benoît Perichon et al., PrimeGrid2010(specific form not detailed in records)(specific form not detailed in records)
27Rob Gahan, PrimeGrid2019224,584,605,939,537,911 + 81,292,139 × 23# × n for n=0 to 2681,292,139 × 23#
16Norman Luhn202550,070,689,738 + 330,062,135 n101# + 1 (adjusted form)
17Norman Luhn202511,691,689,348 + 319,091,802 n97# + 1 (adjusted form)
26James Jayaputera20251,688,330,176 + 397,344,516 n23# + 138,131,087

Computational Methods

The search for long arithmetic progressions (APs) of primes relies on specialized algorithms that efficiently generate and test candidate sequences, given the sparsity of primes and the vast search space involved. A key technique involves constructing the common difference d as a multiple of the primorial p_m\#, the product of the first m primes, where m is chosen such that the smallest primes up to roughly the desired AP length k divide d. This ensures that none of the first k terms in the progression a + nd (for n = 0, 1, \dots, k-1) are divisible by those small primes, thereby avoiding trivial composites early in the sieve. To handle the immense computational demands, distributed computing frameworks are employed, notably PrimeGrid, a BOINC-based volunteer project that coordinates sieving and testing across thousands of processors worldwide. In PrimeGrid's AP subprojects, candidates are generated in the form a + n \cdot d \cdot 23\# (for searches targeting APs of length 26 or longer), with initial probable prime (PRP) tests using Lucas-Lehmer-R or Miller-Rabin variants to filter composites before deeper verification. This parallelization allows exhaustive searches over ranges up to $10^{18} or beyond, sieving billions of potential progressions per day. Heuristic optimizations guide the search by starting with large initial terms a, typically around N where the prime density \sim 1/\log N aligns with the expected frequency of k-term APs, estimated asymptotically as \sim c_k N^2 / \log^k N for some c_k > 0. This focuses effort on regions where the probability of all k terms being prime is non-negligible, reducing wasted computation on smaller numbers with higher composite likelihood. Sieving proceeds in stages: an initial pass with small primes to eliminate residues their products, followed by wheel factorization or segmented sieves for larger moduli to prune the candidate set efficiently. Major challenges arise from the in the search space; for an AP of length k, the number of candidates scales roughly as N^{k-1} before sieving, compounded by the need to test numbers up to O(N) with k factors of \log N. Efficient primality testing is crucial, with probabilistic methods like strong PRP for rapid screening and deterministic proofs via the Primality Proving (ECPP) algorithm for certification, which constructs a of elliptic curves to verify compositeness or primality in subexponential time. Custom C/C++ programs handle the sieving core, while verification often employs libraries such as PARI/GP for its robust arithmetic functions and isprime routine (integrating ECPP via external calls) or for high-performance testing of large integers.

Polynomial Progressions

The extension of the Green–Tao theorem to polynomial progressions was achieved by and Tamar Ziegler in their 2008 paper. They established that the set of primes contains arbitrarily long polynomial progressions, meaning that for any fixed integer k \geq 1 and any collection of integer-valued polynomials P_1, \dots, P_k \in \mathbb{Z} satisfying P_i(0) = 0 for each i, and for any \epsilon > 0, there are infinitely many pairs of positive integers x, m with $1 \leq m \leq x^\epsilon such that x + P_1(m), \dots, x + P_k(m) are all prime. This result builds directly on the original linear case by allowing the shifts to be given by higher-degree polynomials in a single parameter m, rather than just linear forms. The proof relies on a refined principle adapted to the pseudorandom structure of the primes, combined with and estimates to handle the polynomial complexity. A key aspect of this work is its handling of systems where the polynomials have degrees greater than 1, enabling configurations such as progressions in primes. For instance, one can take polynomials like P_1(m) = 0, P_2(m) = m^2 + m, and P_3(m) = 2m^2 + 2m to capture primes forming an in quadratic terms, ensuring all values x + P_i(m) are prime for infinitely many x, m. Such quadratic examples connect to broader number-theoretic questions, including the distribution of primes represented by polynomials, which arise in studies of elliptic curves over where prime values inform the curve's and torsion. However, the proof is ineffective, providing no explicit bounds, and no computational examples of such non-linear progressions are known. In a related development, and extended these ideas in to multidimensional generalizations involving systems of linear polynomials in multiple variables, proving a multi-dimensional Szemerédi-type for the primes via a that transfers density results from the integers to the primes. This captures more complex patterns, such as corners or other linear configurations in higher dimensions, while preserving the sparsity of the primes.

Multidimensional Generalizations

The multidimensional generalization of the Green–Tao theorem establishes that the primes contain arbitrarily long arithmetic progressions in higher dimensions. Specifically, Tao and Ziegler proved that for any dimension d \geq 1 and any finite configuration of points in \mathbb{Z}^d, there exists a translate of that configuration lying entirely in the d-tuples of primes \mathcal{P}^d, provided the configuration is sufficiently dense. This result extends the original theorem by showing that dense subsets of \mathcal{P}^d contain copies of any prescribed linear configuration, such as parallel arithmetic progressions or more complex lattice-based patterns. Configurational progressions in this context refer to systems of linear forms evaluated simultaneously at primes across multiple dimensions, generalizing the one-dimensional case. For example, configurations like (n, n+m, n+2m) can be lifted to vector forms in \mathbb{Z}^d, such as \mathbf{x}, \mathbf{x} + \mathbf{h}, \mathbf{x} + 2\mathbf{h} where \mathbf{x}, \mathbf{h} \in \mathbb{Z}^d, ensuring all components are prime. These structures capture multidimensional analogs of arithmetic progressions, including shifted or rotated variants, and the theorem guarantees their infinite occurrence in the primes. An analogous result holds for Gaussian primes in the complex integers \mathbb{Z}. Tao established that the Gaussian primes contain infinitely many constellations of any prescribed finite shape and orientation, including arbitrarily long arithmetic progressions along any direction in the Gaussian integer lattice. This extension relies on a transference principle adapted to the ring \mathbb{Z}, confirming that Gaussian primes exhibit similar additive structure to ordinary primes. The proofs of these generalizations draw on density analogs via the multidimensional Szemerédi theorem, which asserts that dense subsets of \mathbb{Z}^d contain all finite linear configurations. A correspondence principle transfers this combinatorial density result to the primes by constructing pseudorandom weights that model prime distributions in multiple dimensions, allowing the use of ergodic and analytic tools. These results have found applications in constructing elliptic curves with specified ranks and other properties. For instance, as of 2025, work by Koymans and Pagano uses variants of the Green-Tao theorem to produce elliptic curves over number fields with high rank, earning Pagano the . Similarly, results on Gaussian primes have been applied to elliptic curves over \mathbb{Q}(i). Despite these advances, obtaining quantitative bounds on the length or location of such multidimensional progressions that are uniform across all dimensions d remains an open question, as current estimates depend heavily on d.