Collatz conjecture
The Collatz conjecture, also known as the 3n + 1 problem or 3x + 1 conjecture, posits that for any positive integer n, the sequence generated by iteratively applying the rule—divide by 2 if n is even, or replace with $3n + 1 if n is odd—will eventually reach the number 1, after which it cycles through 4, 2, 1 indefinitely.[1] Proposed by German mathematician Lothar Collatz in 1937, the conjecture has resisted proof despite its deceptively simple formulation, earning it a reputation as one of mathematics' most enduring unsolved problems.[1] It has been computationally verified to hold for all starting values up to $2^{71} \approx 2.36 \times 10^{21} as of 2025, with no counterexamples found, though this exhaustive checking covers only a minuscule fraction of all positive integers.[2] The problem's sequences, termed hailstone numbers due to their fluctuating "up and down" behavior, have been analyzed for properties like stopping times and maximum excursions, revealing patterns but no general resolution.[1] Despite partial results—such as Riho Terras's 1976 proof that the orbit falls below the starting value for almost all numbers, and Terence Tao's 2019 demonstration that sequences for almost all starting points reach values much smaller than logarithmic in n—the conjecture remains open, with mathematicians like Paul Erdős noting that current tools are inadequate for its solution.[3] Its significance lies in challenging assumptions about dynamical systems and iteration in number theory, inspiring variants and computational projects while highlighting the gap between empirical evidence and rigorous proof.[4]Core Statement and Background
Problem Statement
The Collatz conjecture, proposed by German mathematician Lothar Collatz in 1937, posits that for every positive integer n, repeatedly applying a specific transformation will eventually yield the number 1.[1][5] The transformation, often denoted as the Collatz map C(n), is defined piecewise as follows: C(n) = \begin{cases} \frac{n}{2} & \text{if } n \text{ is even}, \\ 3n + 1 & \text{if } n \text{ is odd}. \end{cases} Starting with any positive integer n_0, the sequence is generated by n_{k+1} = C(n_k) for k \geq 0, and the conjecture asserts that there exists some finite k such that n_k = [1](/page/1), regardless of the initial n_0 > 0.[1] Applying the map to 1 produces the trivial cycle $1 \to 4 \to 2 \to [1](/page/1), which repeats indefinitely and represents the only known cycle among positive integers under this iteration.[1]Historical Context
The Collatz conjecture originated in the work of German mathematician Lothar Collatz, who formulated it in 1937 while working as an assistant and completing his habilitation at the Technical University of Karlsruhe.[6] Collatz was exploring iterative functions on integers, motivated by problems in numerical analysis, and posed the question of whether repeated application of the specified rule would always lead positive integers to 1, though he did not publish it formally at the time.[6] He circulated the problem informally among colleagues, including at the 1950 International Congress of Mathematicians in Cambridge, where it began to attract attention within the mathematical community.[7] The conjecture was independently discovered in 1952 by British mathematician Bryan Thwaites, who arrived at the same iterative process while studying sequences of integers.[7] Thwaites later offered a £1000 prize for its resolution, underscoring its allure despite the lack of progress.[8] By the mid-1950s, early computational efforts, using rudimentary electronic calculators, had verified the conjecture for all starting positive integers up to 100,000, providing initial empirical support but no proof.[9] The problem gained broader popularity in the 1970s, particularly through Martin Gardner's "Mathematical Games" column in Scientific American, which featured it in 1972 and highlighted its deceptive simplicity.[10] Hungarian mathematician Paul Erdős, known for his work in number theory, famously remarked on its challenge, stating, "Mathematics may not be ready for such problems," while offering a $500 prize for a solution or counterexample.[7] This era marked a shift from niche academic interest to wider recognition, though the conjecture remained unsolved.[9]Empirical Observations
Computational Verification
The Collatz conjecture has been extensively tested computationally, with verification efforts demonstrating that all positive integers up to increasingly large bounds converge to the 4-2-1 cycle under the conjecture's rules. Early computations in the 1970s and 1980s confirmed convergence for numbers up to around 10 million, but advancements in hardware and algorithms have dramatically expanded these limits. By the early 2010s, exhaustive checks had verified the conjecture up to approximately $2^{60} (about $1.15 \times 10^{18}), using optimized programs on personal computers. In the 2020s, distributed computing projects and supercomputer clusters pushed these boundaries further, reaching verification up to $2^{68} (roughly $2.95 \times 10^{20}) through parallel processing of congruence classes. A significant milestone came in 2025, when researchers extended this to $2^{71} (approximately $2.36 \times 10^{21}), surpassing prior records by leveraging heterogeneous systems combining CPUs and GPUs across European supercomputers such as MetaCentrum, Karolina, and LUMI. This effort involved thousands of parallel workers coordinated by a central server, processing work units of $2^{40} numbers each in about 5 seconds on modern GPUs. No counterexamples—numbers that fail to converge—have been discovered in any of these verifications, with all tested sequences reaching 1.[2] The efficiency of these computations relies on exhaustive checking optimized by bit manipulation techniques, such as the count-trailing-zeros (ctz) operation to merge multiple division-by-2 steps, and sieving methods like $2^k and $3^k sieves to handle even and odd iterations rapidly. Additionally, modular arithmetic shortcuts, including optimizations modulo 9 for 128-bit integers, allow for concurrent solving of congruence classes, reducing redundant calculations and enabling the 2025 limit to exceed previous benchmarks without proportional increases in computational resources. These approaches ensure rigorous coverage without gaps, confirming convergence for the entire range up to the current threshold.[11]Stopping Times and Records
In the Collatz conjecture, the total stopping time \sigma(n) of a positive integer n is defined as the number of iterations of the Collatz map required to reach 1 starting from n. The dynamic stopping time, often denoted \sigma_0(n), is the number of steps until the sequence first produces a value strictly smaller than n. These measures quantify the length of trajectories in the conjecture's iterative process, with the conjecture asserting that both are always finite for every n \geq [1](/page/1). A well-known example is n=27, which has a total stopping time of 111 steps and reaches a peak value of 9232 before descending to 1; its dynamic stopping time is 41 steps. This sequence exemplifies how even small starting values can produce unexpectedly prolonged paths due to repeated applications of the $3n+1 rule before sufficient divisions by 2 occur. Computational efforts have established records for the largest total stopping times within verified ranges. For instance, exhaustive searches below $10^8 identified the record \sigma(n)=949 for n=63728127. More recent verifications extending to $10^{18} have uncovered even longer trajectories, with the current record for that range being \sigma(n)=2456 for n=28019077177231758495, highlighting the gradual increase in maximum path lengths as larger n are examined. These records are maintained through optimized algorithms that track delay (total stopping time) while confirming convergence.[12] Empirical patterns in stopping times suggest a heuristic growth rate of \sigma(n) \approx O(\log n), reflecting the balanced effect of multiplications by 3 and divisions by 2, which on average reduce the logarithm of the value despite occasional upward spikes. This logarithmic tendency is evident in the record-holding values, where maximum stopping times scale roughly with the logarithm of the search limit, though individual sequences can deviate significantly due to sequences of odd iterations that amplify the number before contraction. Numbers requiring many consecutive odd steps, such as those with long strings of 1s in their binary representation (e.g., Mersenne numbers $2^k - 1 for large k), tend to produce exceptionally long chains by maximizing the $3n+1 applications relative to divisions.Visual Representations
Visual representations of Collatz sequences provide intuitive insights into their dynamic behavior, illustrating the iterative paths from starting integers to 1 and highlighting patterns in convergence. Trajectory plots, often called hailstone plots due to the sequences' upward peaks and downward descents resembling hailstones in a cloud, depict the value of the sequence n_k against the iteration step k. For instance, the trajectory starting from n=27 climbs to a maximum of 9232 before descending to 1 over 111 steps, showcasing multiple oscillations that underscore the conjecture's unpredictable yet conjectured terminating nature.[13] Tree diagrams offer another perspective by constructing the inverse Collatz process, rooting the structure at 1 and branching outward to encompass all positive integers through predecessor mappings. Each node m has a primary predecessor $2m corresponding to the even division rule in reverse, and a secondary branch to (m-1)/3 if this yields a positive integer, representing the inverse of the odd multiplication step; this conditional branch ensures only valid odd predecessors are included, forming an infinite arborescence under the conjecture. Such diagrams reveal the conjecture's implication that every positive integer appears exactly once in the tree, with visualizations limited to finite depths (e.g., orbits of length up to 19) demonstrating the merging of paths toward the root.[14][15] Density plots and heat maps of stopping times—defined as the number of iterations to reach 1—further illuminate structural patterns, particularly when analyzed modulo powers of 2 to capture residue class behaviors. These visualizations, often rendered as scatter plots or color-coded grids, show clusters and logarithmic growth in stopping times across ranges like 10,000 to over 5 million inputs, with modulo $2^k analyses revealing near-uniform distributions that align with probabilistic expectations for convergence. For example, plots modulo higher powers of 2 highlight how sequences in certain residue classes exhibit prolonged or abbreviated paths, providing empirical support for the conjecture's global reach to 1.[15][16] Software tools like Mathematica facilitate the generation of these hailstone plots and interactive demonstrations, enabling users to compute and visualize sequences for ranges up to 5000 or more, with built-in functions for trajectory graphing and tree construction. These computational aids, part of the Wolfram Language, allow dynamic exploration of individual paths or aggregate stopping time distributions, aiding in the identification of record-setting sequences without exhaustive enumeration.[17]Supporting Theoretical Arguments
Probabilistic Models
One of the primary heuristic arguments supporting the Collatz conjecture models the iteration as a random process where the parity of each iterate is independently even or odd with equal probability \frac{1}{2}. Under this assumption, an even step multiplies the number by \frac{1}{2}, while an odd step applies $3n+1 (approximately $3n), followed immediately by a division by 2 since $3n+1 is even, yielding a net factor of approximately \frac{3}{2}. This leads to an expected logarithmic growth per step of \frac{1}{2} \log \frac{1}{2} + \frac{1}{2} \log \frac{3}{2} = \frac{1}{2} \log \frac{3}{4} < 0, approximately -0.1438. The geometric mean growth factor over two steps (one even and one odd) is thus \sqrt{\frac{3}{4}} \approx 0.866 < 1, implying that the orbit decreases geometrically on average and should reach 1 almost surely under the random model. A more refined analysis considers the long-run proportion \alpha of odd steps in an orbit, yielding an expected log growth E[\log C(n)] = \alpha \log \frac{3}{2} + (1 - \alpha) \log \frac{1}{2}. This is negative provided \alpha < \frac{\log 2}{\log 2 + \log (3/2)} \approx 0.631; empirical observations show \alpha \approx 0.5, confirming descent. Probabilistic models further predict that even iterates occur about two-thirds of the time in typical orbits, reinforcing the heuristic. A partially rigorous confirmation of this descent behavior appears in Terence Tao's 2019 result, which proves that for any function f(N) \to \infty as N \to \infty, almost all positive integers n \leq N (in the sense of logarithmic density) have an orbit under the Collatz map that attains a value less than f(N). This establishes that almost all orbits descend to nearly bounded values, providing strong theoretical support for the probabilistic intuition without resolving the full conjecture.[18]Bounds and Growth Estimates
Efforts to establish lower bounds on the length of any nontrivial cycles in the Collatz map have relied on exhaustive computational searches combined with theoretical constraints. Simons and de Weger (2005) proved that no nontrivial cycle exists with length less than 68 by solving associated Diophantine inequalities and bounding the minimal element in potential cycles. Subsequent improvements, including work by Hercher (2023), have extended this to show that there are no nontrivial Collatz m-cycles for m ≤ 91, using enhanced approximations for the growth parameter X_0 ≈ 704 × 2^{60} to rule out shorter cycles via reciprocal sum inequalities.[19] Upper bounds on the growth of Collatz trajectories provide insight into why sequences are expected to remain controlled. Krasikov and Lagarias (2003) introduced a system of difference inequalities to analyze the 3x+1 map, deriving lower bounds on the density of integers that "succeed" in reducing below a threshold after k iterations. Specifically, for any positive integer a not divisible by 3, the number π_a(x) of such integers up to x satisfies π_a(x) ≥ x^{0.84} for sufficiently large x, achieved via optimization with k=11 and growth factor λ ≈ 1.792. This result implies that sequences cannot grow indefinitely without cycling, as a substantial proportion (at least x^{-0.16}) of starting values reach small bounded regions after fixed steps, forcing eventual decrease.[20] These density bounds extend to convergence properties of the iterated map. For certain classes of starting values, particularly those comprising almost all positive integers in the logarithmic density sense, the trajectory returns below the initial value after sufficiently many iterations. Tao (2019) established that for any f: ℕ → ℝ with f(N) → ∞ as N → ∞, the minimal value attained in the Collatz orbit of N is less than f(N) for almost all N. In particular, \min_{m \geq 0} |C^m(N)| < \log \log \log \log N holds for almost all N, demonstrating that |C^m(N)| < N for some large m in these classes and controlling overall growth by ensuring orbits do not diverge.[21]Analysis of Cycles
A cycle in the Collatz conjecture refers to a periodic orbit under the iterated Collatz map, consisting of a finite set of distinct positive integers where each element maps to the next, and the last returns to the first. The only known such cycle is the trivial one of length 3: $1 \to 4 \to 2 \to 1.[22] To determine possible non-trivial cycles, one considers a k-cycle as a solution to a system of equations derived from the Collatz rules applied periodically. Specifically, for integers n_1, n_2, \dots, n_k > 0 forming a cycle, each step satisfies either n_{i+1} = n_i / 2 if n_i is even, or n_{i+1} = (3n_i + 1)/2 if n_i is odd (accounting for the immediate division by 2 since $3n_i + 1 is even), with n_{k+1} = n_1, and all n_i integers. More generally, sequences of consecutive even steps are incorporated by allowing multiple divisions by 2 until an odd number is reached. Solving this nonlinear Diophantine system directly is challenging, but for small k, exhaustive enumeration over possible parity patterns rules out cycles. For larger k, analytical methods transform the problem into finding integer solutions where the net multiplier over the cycle is 1, leading to equations balancing the growth from $3n + 1 steps against divisions by 2.[23] Proofs of the non-existence of small non-trivial cycles often rely on solving these generalized k-cycle equations modulo increasing powers of 2, exploiting the 2-adic structure to derive contradictions for candidate solutions. By assuming a cycle exists and lifting solutions modulo $2^m for successively larger m, inconsistencies arise when the required valuations or congruences cannot hold simultaneously with the oddness conditions. This modular approach, combined with bounds on the minimal element in any cycle, establishes that no non-trivial cycles exist with length less than 1,027,712,276 as of 2021. Subsequent refinements using similar techniques and improved approximations to \log_2 3 have pushed this bound to over $10^{11} as of 2025.[24] The absence of short non-trivial cycles has profound implications for the conjecture. If the Collatz conjecture is false, any counterexample must feature either a non-trivial cycle of extraordinary length—far exceeding current lower bounds—or a divergent trajectory that grows unbounded without periodic behavior, as all finite starting values have been computationally verified to reach the trivial cycle up to immense limits.[22]Equivalent Formulations
Reverse Collatz Process
The reverse Collatz process examines the predecessors of a number under the Collatz map C, where C(n) = n/2 if n is even and C(n) = 3n + 1 if n is odd. This inverse operation allows tracing numbers that map to a given m in one step. Specifically, every m has $2m as a predecessor, since C(2m) = m. Additionally, if m \equiv 4 \pmod{6}, then (m-1)/3 is an integer, and this value is odd and serves as another predecessor, because if k = (m-1)/3 is odd, then C(k) = 3k + 1 = m.[25][26] Formally, the preimage set is given by C^{-1}(m) = \{2m\} \cup \left\{ \frac{m-1}{3} \;\middle|\; m \equiv 4 \pmod{6},\ \frac{m-1}{3} \in \mathbb{N} \right\}. The condition m \equiv 4 \pmod{6} ensures that (m-1)/3 is an odd positive integer when applicable. For m = 1, the preimage is \{2\}, corresponding to the start of the cycle. Applying C^{-1} repeatedly generates the full backward orbit from any starting point.[27] The Collatz tree emerges from iteratively applying this inverse process starting from the root at 1, forming an infinite directed tree structure over the positive integers. Each node m in the tree has outgoing edges to its predecessors in C^{-1}(m), resulting in a tree where most nodes have out-degree 2 (branching via $2m and occasionally (m-1)/3), though some have out-degree 1. The tree is acyclic by construction in the reverse direction and spans an increasingly large portion of the positive integers as depth increases. Computational explorations confirm that the tree includes all integers up to large bounds without repetition, as the forward Collatz map from any tree node leads uniquely toward 1.[27] Under the Collatz conjecture, every positive integer appears exactly once in this tree, with no overlaps or gaps, due to the deterministic nature of the forward map implying unique paths to 1. The conjecture is thus equivalent to the statement that this tree includes all positive integers, as any missing number would imply a divergent orbit or non-trivial cycle in the forward direction. Verification up to $10^{18} supports this coverage empirically, though a full proof remains elusive.[27][26]Parity Sequence Interpretation
The parity sequence interpretation reformulates the Collatz conjecture by encoding the trajectory of each positive integer under the Collatz map as a binary sequence based on the parity of its iterates. For a starting number n, the parity vector \mathbf{p}(n) = (p_0, p_1, \dots, p_k) is defined such that p_i = 0 if the i-th iterate is even and p_i = 1 if odd, with p_0 corresponding to the parity of n itself. This vector captures the sequence of operations: divisions by 2 for even iterates and the $3n+1 transformation for odd ones, which always produces an even number. The Collatz conjecture is equivalent to the statement that, for every positive integer n, the parity vector \mathbf{p}(n) is finite in the sense that it eventually enters the repeating pattern $0, 0, 1 associated with the 4–2–1 cycle. In this framework, the forward dynamics of the Collatz map generate the parity sequence deterministically from n. An odd iterate (marked by a 1) is followed by at least one 0, corresponding to the even result of $3n+1 and subsequent divisions by 2 until the next odd number. Thus, the sequence consists of 1's separated by one or more 0's, reflecting the "full steps" from odd to odd via the Syracuse function, which compresses even divisions: S(n) = \frac{3n+1}{2^m}, where m \geq 1 is the number of trailing 0's in the parity sequence after each 1. This mapping equates the parity vector to a binary encoding of the Syracuse iterations on the odd components of the trajectory.[28] The advantage of this interpretation lies in reducing the problem to the analysis of binary sequences and their associated shifts, allowing estimation of trajectory growth or decay. Specifically, the k-th iterate can be approximated as T^k(n) \approx n \cdot 3^{d(k)} \cdot 2^{-k}, where d(k) is the number of 1's (odd steps) in the first k positions of \mathbf{p}(n). Convergence to 1 occurs if there exists k such that d(k) \log 3 - k \log 2 < 0, implying the sequence decreases below the starting value; the conjecture posits this holds for all n. This binary perspective facilitates studying the distribution of 1's in parity vectors, akin to examining shifts in binary expansions, and has been used to prove that almost all integers reach small values quickly.[28] For example, starting from n = 3, the iterates are 3 (odd), 10 (even), 5 (odd), 16 (even), 8 (even), 4 (even), 2 (even), 1 (odd), yielding \mathbf{p}(3) = (1, 0, 1, 0, 0, 0, 0, 1), which transitions into the cycle pattern after the initial segment. Such representations highlight how the parity vector encodes the entire path without tracking numerical values, emphasizing the structural properties of the sequences generated by the Collatz rules.[29]Tag Systems and Automata
The Collatz conjecture admits an equivalent formulation in terms of tag systems, a class of abstract rewriting systems introduced by Emil Post in the 1920s as models of computation. A tag system operates on finite words over a finite alphabet, with a fixed deletion parameter v that removes the first v symbols from the current word at each step; the production rule is then applied to the very first symbol of the original word, appending the corresponding output string to the remaining part of the word. This process continues until the word has length less than v, at which point the system halts. The simplicity of tag systems belies their expressive power, as they are capable of universal computation, and their halting problem is undecidable in general.[30] A specific 2-symbol tag system with deletion parameter v=2 and production rules $0 \to 1, $1 \to 110 encodes the Collatz iteration directly on the binary representation of the starting positive integer n. The initial word is constructed by taking the binary digits of n in reverse order and prefixing them with a leading 0, ensuring proper alignment for the operations. Each step of the tag system then mimics the Collatz rules: the rule for 0 corresponds to handling even parity (effectively dividing by 2 via binary shift), while the rule for 1 implements the odd case (multiplying by 3 and adding 1, encoded through the appended bits that adjust the binary value accordingly). The length of the word at each step tracks the magnitude of the corresponding Collatz iterate, and the conjecture is equivalent to asserting that, for every such initial word, the system eventually reaches a halting configuration representing 1. This equivalence was established by showing that the tag system's evolution preserves the Collatz dynamics bidirectionally.[31] This tag system formulation underscores the Collatz process as an abstract base-2 computing machine governed by parity-driven operations on binary strings. The machine processes the bits sequentially, with each deletion and appendage altering the string in a way that parallels arithmetic transformations, revealing the conjecture's deep ties to fundamental questions in computability. Liesbeth de Mol further demonstrated that the Collatz problem reduces to the halting and reachability problems for tag systems with v=2, linking it to broader undecidability results in automata theory.[31] For illustration, consider n=3, whose binary representation is 11; the initial word is thus 011. Applying the rules:- First symbol 0, append 1, delete first 2 (01): remaining 1 + 1 = 11.
- First symbol 1, append 110, delete first 2 (11): empty + 110 = 110.
- First symbol 1, append 110, delete first 2 (11): remaining 0 + 110 = 0110.
- First symbol 0, append 1, delete first 2 (01): remaining 10 + 1 = 101.
- Subsequent steps continue, mirroring the trajectory 3 → 10 → 5 → 16 → 8 → 4 → 2 → 1 until halting.