Quadratic sieve
The Quadratic sieve (QS) is an integer factorization algorithm designed to efficiently factor large composite numbers, particularly those without small prime factors, by identifying congruences of the form x^2 \equiv y^2 \pmod{n} where x \not\equiv \pm y \pmod{n}, allowing the computation of a nontrivial factor via \gcd(x - y, n).[1] Developed by Carl Pomerance in 1981 as an improvement over earlier methods like the continued fraction algorithm and Schroeppel's linear sieve, it employs a sieving technique to detect B-smooth values of quadratic polynomials Q(x) = (x + \lfloor \sqrt{n} \rfloor)^2 - n over a factor base of small primes p for which the Legendre symbol (n/p) = 1.[2] This process generates relations that are combined using linear algebra over \mathbb{F}_2 (via Gaussian elimination) to produce the desired square congruences.[3] The algorithm's efficiency stems from its subexponential running time, conjectured to be \exp\left( (1 + o(1)) \sqrt{\ln n \cdot \ln \ln n} \right), which made it the fastest general-purpose factoring method from the early 1980s until the introduction of the number field sieve in 1993.[3] For numbers up to approximately 110 digits, QS and its optimized variants, such as the multiple polynomial quadratic sieve (MPQS), outperform the number field sieve, enabling practical factorizations like the 129-digit RSA-129 challenge in 1994 after eight months of computation on distributed systems.[2] Historically, it built on Maurice Kraitchik's 1926 framework of generating multiple congruences and John Pollard's 1974 ideas for smooth numbers, surpassing the continued fraction method's complexity of L(n)^{\sqrt{2}} (where L(n) = \exp(\sqrt{\ln n \cdot \ln \ln n})) with a simpler per-step implementation.[1] QS remains significant in computational number theory and cryptanalysis, particularly for assessing the security of RSA encryption against factoring attacks on moduli of 100–120 digits, though it has been largely supplanted for larger numbers by more advanced sieves.[3] Variants like the self-initializing quadratic sieve incorporate optimizations such as partial sieving and large prime handling to further reduce runtime, demonstrating the algorithm's adaptability in high-performance computing environments.[2]Introduction
Overview and Purpose
The Quadratic Sieve (QS) is a general-purpose integer factorization algorithm designed to factor a composite integer n by identifying quadratic residues modulo n that are smooth with respect to a chosen factor base of small primes.[1] This approach exploits the structure of quadratic polynomials evaluated near \sqrt{n} to generate candidates for such residues efficiently.[2] The primary aim of QS is to produce pairs of integers (x, y) satisfying the congruence x^2 \equiv y^2 \pmod{n} where x \not\equiv \pm y \pmod{n}, which yields non-trivial factors via \gcd(x - y, n) and \gcd(x + y, n).[1] These pairs arise from combining relations where the quadratic residues factor completely over the factor base, allowing a linear algebra step to construct the desired congruences.[2] QS is particularly suitable for factoring integers up to approximately 100-110 decimal digits, where it outperforms trial division for larger inputs but is generally faster than the elliptic curve method (ECM) when factors are of comparable size in medium-to-large n.[2] Compared to earlier methods like Dixon's random squares algorithm, QS offers significant speed improvements through optimized sieving; however, for numbers exceeding 110 digits, the general number field sieve (GNFS) is more efficient.[4]Historical Development
The quadratic sieve algorithm was invented by Carl Pomerance in 1981 as an improvement over John D. Dixon's random squares method from the same year, which had introduced a probabilistic approach to finding smooth values but suffered from inefficient smoothness testing. Pomerance's innovation shifted the focus to sieving techniques for detecting smooth quadratic residues modulo the target number n, achieving subexponential time complexity of \exp\left( \sqrt{\log n \log \log n} \right), a significant advance over Dixon's \exp\left( \sqrt{2 \log n \log \log n} \right). This method built on earlier ideas from Richard Schroeppel's linear sieve proposal in the late 1970s, but Pomerance's version became the first practical implementation for large-scale factoring.[4] Early implementations emerged rapidly in the 1980s, demonstrating the algorithm's viability on emerging supercomputers. In 1982, Joseph Gerver factored a 47-digit number using the quadratic sieve, while Jim Davis and Diane Holdridge adapted it for the Cray-1, establishing initial performance benchmarks. By 1984, a Sandia National Laboratories team factored the 76-digit Mersenne number $2^{251} - 1, earning recognition from the Association for Computing Machinery for advancing cryptographic analysis. Refinements continued, with Peter L. Montgomery introducing the multiple polynomial quadratic sieve (MPQS) variant in 1987, which used multiple quadratic polynomials to shorten sieving intervals and boost efficiency for numbers up to 100 digits, enabling parallelization across distributed systems.[4] Key milestones underscored the algorithm's impact in the 1990s. In 1994, a distributed computing effort led by Derek Atkins, Arjen K. Lenstra, and over 600 volunteers factored the 129-digit RSA-129 challenge number using MPQS, taking eight months and marking the first internet-coordinated large-scale factorization. This success highlighted the transition to MPQS for handling larger composites, as the original quadratic sieve became less competitive beyond 80 digits. The method's sieving framework directly influenced the development of the number field sieve (NFS) by Arjen K. Lenstra, Hendrik W. Lenstra, Mark Manasse, and John M. Pollard in 1990–1993, which generalized the approach to multiple algebraic number fields for even greater speed on numbers over 100 digits.[4] The quadratic sieve remains relevant for educational purposes and factoring medium-sized integers up to about 110 digits, despite being superseded by NFS variants for record-breaking efforts. Optimized versions in computational number theory tools, such as msieve and YAFU, continue to demonstrate its pedagogical value in illustrating sieving and linear algebra over finite fields.[4]Mathematical Foundations
Key Number Theory Concepts
The quadratic sieve algorithm relies on fundamental concepts from number theory, particularly modular arithmetic, to identify congruences that facilitate integer factorization. In modular arithmetic, residues are the equivalence classes of integers modulo n, represented by the least nonnegative integers from 0 to n-1. Quadratic residues modulo n are the values b for which there exists an integer x such that x^2 \equiv b \pmod{n}; these are essential in the quadratic sieve for constructing pairs where a^2 \equiv b^2 \pmod{n}, enabling the discovery of nontrivial factors via the greatest common divisor \gcd(a - b, n).[3][4] The core problem addressed by the quadratic sieve is the integer factorization of a composite number n, particularly in the semiprime case where n = p \cdot q with p and q large distinct primes, as encountered in cryptographic applications like RSA. The goal is to recover p and q efficiently, as direct methods like trial division require up to \sqrt{n} operations, which becomes infeasible for n exceeding about 30 digits. By finding suitable congruences modulo n, the algorithm splits n without exhaustive search.[3][4] A key prerequisite is the notion of smooth numbers, specifically B-smooth integers, which are positive integers whose prime factors are all at most B. In the quadratic sieve, B-smooth numbers are crucial for collecting "relations," as the algorithm sieves for values of a quadratic polynomial that factor completely over a small set of primes up to B, allowing the construction of products that form perfect squares modulo n. The importance stems from the pigeonhole principle: if more than \pi(B) (the number of primes up to B) such smooth values are found, a subset must multiply to a square due to dependencies in their factorizations.[3][4] To identify these squares from the collected relations, the quadratic sieve employs linear algebra over the finite field \mathbb{F}_2 (GF(2)), where arithmetic is performed modulo 2. Each B-smooth number is represented by an exponent vector v = (e_1, e_2, \dots, e_{\pi(B)}) \in \mathbb{F}_2^{\pi(B)}, with e_i being the exponent of the i-th prime modulo 2. A linear dependency among these vectors—i.e., a subset summing to the zero vector—corresponds to a product of the smooth numbers that is a perfect square, as all exponents become even. Solving this involves Gaussian elimination on the matrix of vectors to find a kernel element of dimension at least 1.[3][4] In the ideal theoretical framework, a relation arises because if s = \prod p_i^{e_i} is B-smooth, then \log s \equiv 2 \log x \pmod{\log n} for some x, reflecting the near-square nature of the quadratic values modulo n. This is simplified in practice to the exponent vector approach: the congruence x^2 \equiv s \pmod{n} leads to a vector equation where the exponents satisfy the dependency condition modulo 2, yielding X^2 \equiv Y^2 \pmod{n} upon combining relations, with X and Y derived from the x values and square roots of the smooth products.[3][4]Role of Smooth Numbers and Sieving
In the quadratic sieve algorithm, a central challenge is identifying B-smooth numbers, which are integers whose prime factors are all at most a chosen bound B. These numbers are crucial because the values generated by the quadratic polynomial, typically of size around \sqrt{n}, have a low probability of being B-smooth; specifically, for a random integer v \approx \sqrt{n}, the likelihood is approximately u^{-u} where u = \frac{\ln v}{\ln B} \approx \frac{1}{2} \frac{\ln n}{\ln B}.[5] This rarity motivates efficient detection methods. Smoothness matters because it enables the construction of relations of the form x_i^2 \equiv \prod_{p \leq B} p^{e_{i,p}} \pmod{n}, where the exponents e_{i,p} form vectors over \mathbb{Z}/2\mathbb{Z}. By finding a linearly dependent set of these vectors—via Gaussian elimination on the relation matrix—a product of corresponding relations yields y^2 \equiv z^2 \pmod{n} for some integers y and z, allowing factorization through \gcd(y - z, n).[3] This dependency exploits the algebraic structure modulo n, transforming the search for factors into a linear algebra problem over smooth factorizations. To generate candidate smooth values efficiently, the algorithm evaluates the quadratic polynomial f(a) = (a + \lfloor \sqrt{n} \rfloor)^2 - n for integers a in a range around zero, producing values |f(a)| \lesssim 2a\sqrt{n} that are small enough to have reasonable smoothness probability. Sieving approximates smoothness by initializing an array over the sieving interval and, for each prime p \leq B, identifying roots of f(a) \equiv 0 \pmod{p} (at most two per p) and incrementing the array at congruent positions. The key update rule is: \text{For each root } r \pmod{p}, \quad \text{add } \log p \text{ to sieve locations } a \equiv r \pmod{p}. Positions where the accumulated sum exceeds a threshold (near \log |f(a)|) flag potential smooth candidates for full factorization, avoiding exhaustive checks.[6] This sieving approach optimizes efficiency by preprocessing divisibility, reducing the cost of smoothness testing from trial division—which requires O(\sqrt{|f(a)|}) operations per candidate, roughly O(\sqrt{M}) over an interval of size M—to near-linear time in the interval length, approximately O(M \log \log B) total for setup and scanning.[3] By concentrating effort on promising locations, sieving leverages the distribution of smooth numbers to make the overall process subexponential in \log n.[6]Core Algorithm
Basic Sieving Process
The basic sieving process in the quadratic sieve algorithm begins with the selection of key parameters to balance computational efficiency and the likelihood of finding smooth values. The sieve bound B is chosen such that the factor base consists of all primes p \leq B, with B typically on the order of L(n)^{1/\sqrt{2}}, where L(n) = \exp\left( \sqrt{\log n \log \log n} \right) and n is the integer to be factored.[1] The interval size M is selected to cover a range of trial values, typically M \approx \exp\left( \sqrt{\log n \log \log n} \right), ensuring sufficient candidates while keeping memory usage manageable.[1] The core of the process uses a single quadratic polynomial f(a) = (a + k)^2 - n, where k = \lfloor \sqrt{n} \rfloor, and a ranges over the interval [1, M] (or equivalently [-M/2, M/2] for symmetry around the minimum).[1] This polynomial generates values Q(a) = f(a) that are expected to be small near a = 0 and grow quadratically, with |Q(a)| \approx 2ka for small a. The goal is to identify those Q(a) that are B-smooth, meaning they factor completely over the primes \leq B; the probability of such smoothness is roughly u^{-u} where u = \log |Q(a)| / \log B, aligning with the density of smooth numbers.[1] Sieving proceeds by initializing an array of length M with a crude approximation of \log |Q(a)| for each position (these values are all approximately equal, as Q(a) varies smoothly). For each prime p \leq B in the factor base, the roots r_1, r_2 of Q(a) \equiv 0 \pmod{p} (i.e., solutions to (a + k)^2 \equiv n \pmod{p}) are computed, typically using the Tonelli-Shanks algorithm if they exist. Starting from positions congruent to r_1 and r_2 modulo p, the value \log p (or multiples for higher powers of p) is subtracted from the array at every interval of p across the range. This step yields a residual R(a) \approx \log |Q(a)| - \sum \log p over the small prime factors detected, approximating \log of the cofactor after removing those factors.[1][2] After sieving, the array is scanned to identify promising locations. For each a, if the residual satisfies R(a) < c for some small constant c (often around 10–20 to account for approximation errors), the cofactor |Q(a)| / \exp(R(a)) is expected to be small (ideally near 1 if fully smooth). At these thresholds, trial division by all primes \leq B is performed on Q(a) to verify full B-smoothness and obtain the complete factorization. Only confirmed smooth Q(a) are retained as relations for later processing.[1][2] This selective checking reduces the computational cost, as most a are skipped.Collecting and Processing Relations
In the quadratic sieve algorithm, relations are gathered from the values of the quadratic polynomial Q(a) = (a + \lfloor \sqrt{n} \rfloor)^2 - n that are found to be smooth over the factor base of small primes p \leq B for which n is a quadratic residue modulo p. Each such smooth relation takes the form Q(a) = \pm \prod_{i=1}^k p_i^{e_i}, where the p_i are the distinct primes in the factor base and the e_i are their exponents. To prepare for linear algebra, the sign is ignored (as it can be adjusted later), and any square factors in Q(a) are removed, leaving an exponent vector (e_1 \mod 2, \dots, e_k \mod 2) over the finite field \mathbb{F}_2, where k = \pi(B) is the number of primes up to the smoothness bound B.[3][2] The primary goal of collection is to obtain at least k + 1 such relations, ensuring a linear dependence among the exponent vectors over \mathbb{F}_2 by the pigeonhole principle, though in practice slightly more are gathered to account for redundancies and to increase the chances of finding multiple dependencies. Once sufficient relations are collected, a matrix is constructed with rows corresponding to these exponent vectors and columns indexed by the k primes in the factor base. The null space of this matrix is then computed using Gaussian elimination over \mathbb{F}_2, identifying a basis for dependencies where the sum of selected vectors is the zero vector, meaning the total exponents in the corresponding product of relations are even.[3][4][2] A dependency \sum \mathbf{v}_j = \mathbf{0} in \mathbb{F}_2^k implies that the product \prod Q(a_j) has even exponents for all primes, hence is a perfect square y^2 modulo n. From this, the square congruence is formed as \left( \prod a_j \right)^2 \equiv y^2 \pmod{n}, since each individual relation satisfies a_j^2 \equiv Q(a_j) \pmod{n}. Non-trivial factors of n are then extracted by computing \gcd\left( \prod a_j - y, n \right) (and similarly for the plus sign), provided the two square roots are not congruent modulo n. The sieving phase for generating candidate relations has a computational cost of O(M \log \log M), where M is the sieving interval size, while the matrix processing step requires O(k^3) operations with standard Gaussian elimination, reducible to O(k^2) using optimized methods like the block Wiedemann algorithm.[3][4][2]Optimizations for Efficiency
Multiple Polynomial Approach
In the basic quadratic sieve, the single polynomial f(x) = x^2 - n produces values that cluster near x \approx \sqrt{n}, resulting in uneven distributions of \log |f(x)| across the sieving interval and requiring a large interval size M (often on the order of the smoothness bound B) to collect sufficient relations, which reduces sieving efficiency. The multiple polynomial quadratic sieve (MPQS), proposed by Peter Montgomery and analyzed by Robert D. Silverman, addresses this by employing a set of distinct quadratic polynomials Q_i(x) over a smaller fixed sieving interval [-M, M], where M is typically reduced to about \sqrt{B} compared to the single-polynomial case, thereby balancing the growth of |Q_i(x)| and improving the probability of smoothness.[5] Each polynomial is of the general form Q(x) = A x^2 + B x + C, where the coefficients satisfy B^2 - 4 A C = -4 A n (ensuring Q(x) relates to residues modulo n) and A is chosen as a small multiple of primes p for which the Legendre symbol (n/p) = 1, making n a quadratic residue modulo A.[2][7] To generate the polynomials, one first selects a parameter k \approx M / \sqrt{n} and finds an integer u such that u^2 \equiv k^2 n \pmod{A} for a suitable A, then sets B = 2 u \mod 2 A and C = (B^2 - 4 A n)/(4 A), yielding Q(x) = \frac{(A x + B)^2 - n}{A}, which is integer-valued and small over the interval. The set consists of typically 10 to 20 such polynomials, chosen to optimize root distributions modulo the factor base primes (ensuring even coverage of quadratic residues), which enhances the uniformity of sieving hits.[2][5] Sieving proceeds by processing each polynomial separately over [-M, M], using distinct or shared sieve arrays initialized with roots of Q_i(x) \equiv 0 \pmod{p} for factor base primes p; this incurs additional setup costs for polynomial evaluation and root computation but reduces the total sieving operations by a factor of 2 to 3 relative to the single-polynomial method.[7] The trade-off involves higher overhead from managing multiple polynomials, which improves smoothness detection in smaller intervals but demands careful optimization to avoid excessive computation during switches between them.[5]Handling Large Primes
In the quadratic sieve algorithm, fully B-smooth values of the quadratic residues Q(x) are relatively rare, with most candidates featuring one or two prime factors exceeding the smoothness bound B; this scarcity necessitates collecting approximately k relations for a factor base of size k, prompting the development of large prime variations to boost efficiency.[8] The one large prime (1LP) technique addresses this by accepting partial relations where Q(x) has exactly one prime factor q > B, provided q remains below B^2 to ensure computational feasibility. These partial relations are gathered alongside full B-smooth ones during sieving, without altering the sieving process itself. In the linear algebra step, each unique large prime q is incorporated into an extended factor base, and the exponent vector for such a relation includes an additional coordinate for q's exponent modulo 2. Formally, for a relation involving q, the vector takes the form \mathbf{v} = (e_1, e_2, \dots, e_k, e_q) \pmod{2}, where e_1, \dots, e_k are the exponents modulo 2 of the small primes in the factor base, and e_q = 1 for the large prime. A linear dependency over GF(2) in the augmented matrix then guarantees even exponents for all primes, including the large ones, automatically pairing relations with matching q via the birthday paradox, with a success probability of at least 1/2 for nontrivial factorizations. This expands the matrix to roughly 2k rows by (k + m) columns, where m is the number of distinct large primes (often comparable to k), roughly doubling the relations needed but significantly increasing the overall yield.[2][8] For relations with two large primes (2LP), an extension allows up to two such factors q1 and q2, both exceeding B but bounded above by B^2, treating them similarly in the matrix with additional coordinates for each. Verification of dependencies ensures these large primes pair across relations to form squares, often modeled as cycles in a graph of partial relations, though this further enlarges the matrix and computational cost while enhancing smoothness detection. The combined 1LP and 2LP approaches can reduce sieving time by a factor of up to six compared to pure B-smooth methods, making them essential for practical implementations on large composites.[8][2]Advanced Techniques
Partial Relations and Cycles
In the quadratic sieve algorithm, partial relations arise when the algebraic factor W(x) is not fully smooth over the factor base but contains one or two large prime factors exceeding the smoothness bound B yet bounded by a secondary limit B_2, typically up to B^2. These relations are represented as vectors in the exponent matrix over \mathbb{F}_2, where the large primes introduce additional coordinates beyond the factor base primes. To enhance efficiency, the algorithm collects a surplus of such partial relations—often 20-50% more than the number of full smooth relations required—to ensure saturation of the dependency space through combinations. Cycle finding addresses the challenge of processing these partial relations without constructing a massive exponent matrix that includes all large primes, which would be computationally prohibitive. The method models the large primes (including a special vertex for the "1" representing single-large-prime partials) as nodes in an undirected graph, with edges corresponding to partial relations that connect two large primes (or one to the "1" vertex). A cycle in this graph identifies a set of partial relations whose vector sum in \mathbb{F}_2 yields even exponents for all primes, producing a valid dependency equivalent to a square congruence modulo the target number N. The algorithm proceeds by first building the factor base and collecting partial relations during sieving, then constructing the large prime graph using a hash table for efficient edge insertion. Cycle detection employs depth-first search (DFS) or union-find structures to identify independent cycles, with the number of fundamental cycles given by e - v + c, where e is the number of edges, c the number of connected components, and v the number of vertices. These cycles, combined with the full smooth relations, form the kernel of the full exponent matrix, avoiding the need for Gaussian elimination on the entire augmented matrix. This approach extends the basic handling of single large primes by enabling scalable processing of two-large-prime relations. The primary advantage lies in computational efficiency: traditional Gaussian elimination over the factor base of size k requires O(k^3) time, but cycle finding reduces the dependency resolution to nearly linear time in the number of relations r, approximately O(r \log r) due to sorting and hashing overheads, while being highly parallelizable across the graph components. For numbers exceeding 90 digits, this yields a speedup factor of 2 to 2.5 compared to single-large-prime variants. A self-initializing variant integrates partial relations iteratively, beginning with a small set of full smooth relations to bootstrap the graph and progressively incorporating new partials as they are sieved, reducing polynomial overhead and enabling distributed implementations. Mathematically, for a cycle C = \{ v_1, v_2, \dots, v_m \} of partial relation vectors, the sum satisfies \sum_{i=1}^m v_i = (0, 0, \dots, 0) \pmod{2} in the exponent space, ensuring all prime exponents are even and thus yielding a nontrivial factorization.Smoothness Detection via Sieving
In the quadratic sieve, the basic sieving process approximates the size of Q(a) = f(a) \mod N using a logarithmic scale, where an array tracks an initial estimate of \log |Q(a)| and subtracts \log p (or multiples thereof) for each prime p dividing Q(a) at sieve locations. However, this approximation relies on precomputed "crude" logarithms, which introduce rounding errors, and often skips sieving for very small primes (e.g., those less than 30) to accelerate the process, leading to an accumulated error bounded by P \log P where P is the product of powers of the skipped moduli. These inaccuracies can result in false positives: candidates where the residual log falls below the threshold but Q(a) is not fully B-smooth over the factor base, necessitating additional verification via trial division.[1][2] To refine smoothness detection and reduce false positives, double sieving techniques are applied, as introduced in variations by Davis. In this approach, the initial sieve targets small primes across the full interval, followed by a second sieve over arithmetic progressions modulo a fixed medium-sized prime p, which further divides out factors and shrinks the effective residue size for subsequent checks. This refines the candidate set by confirming divisibility patterns more precisely without exhaustive trial division. Complementing this, multi-level sieving extends the process by incorporating larger primes in a secondary pass over wider intervals around promising locations from the first sieve, balancing coverage and computational cost to capture additional smooth values that might otherwise be missed due to error accumulation.[1] In the multiple polynomial quadratic sieve (MPQS) variant, polynomial-specific adjustments enhance accuracy by calibrating the sieve threshold for each polynomial f_i(x) = a_i x^2 + b_i x + c_i based on its average logarithmic size \log |f_i(a)| over the sieving interval, ensuring the threshold reflects the varying norms of different polynomials and minimizes overlooked smooth candidates. The adjusted threshold for a candidate a is typically set as T(a) = \frac{1}{2} \log N + \log M - T \log p_{\max} + \epsilon, where M is the sieve interval length, p_{\max} is the largest factor base prime, T \approx 2 is an empirically tuned multiplier (e.g., 1.5 for 30-digit numbers, 2.6 for 66-digit), and [\epsilon](/page/Epsilon) bounds the approximation error from log rounding and skipped small primes; candidates with residual logs below T(a) proceed to trial division. These per-polynomial calibrations, pioneered by Montgomery, optimize detection by aligning thresholds with the expected smoothness probability for each f_i.[2] Overall, these refinements significantly reduce the volume of trial divisions—focusing efforts on a small fraction of the original interval—yielding efficiency gains that are essential for scaling to large N, as sieving alone can process millions of values per second on modern hardware.[2]Practical Implementation
Parameter Selection and Examples
In the quadratic sieve algorithm, parameter selection is guided by asymptotic estimates to optimize the balance between sieving effort and the expected number of smooth relations. The smoothness bound B is typically chosen as B \approx \exp\left( \frac{1}{2} \sqrt{\log n \log \log n} \right), which approximates the optimal size based on the density of smooth numbers near \sqrt{n}. The sieving interval length M is set roughly equal to B in basic implementations, though in multiple polynomial variants it may be adjusted smaller to reduce residue sizes.[2] The number of polynomials used is approximately \sqrt{B}, allowing coverage of the interval with smaller individual sieves per polynomial while maintaining efficiency.[2] For a realistic example with n \approx 2^{64} (about 20 decimal digits), parameters might include B = 10^6 for the factor base and M = 10^7 for the sieving interval, leading to an expectation of roughly $10^5 relations needed to achieve a full set for the linear algebra step.[5] These values balance the computational cost of sieving against the matrix size, with the factor base consisting of all primes up to B plus possibly -1 and 2 handled specially. A basic worked example illustrates parameter application for small n. Consider factoring n = 7429 (a semiprime, $7429 = 17 \times 437) using a small smoothness bound B = 100, so the factor base includes primes up to 97. The floor of \sqrt{n} is 86 (since $86^2 = 7396 < 7429 < 7569 = 87^2), and the quadratic polynomial is f(x) = (x + 86)^2 - 7429. Sieving occurs over an interval of x values around 0, say |x| \leq 50, using a sieve array initialized to 0 and updated by subtracting \log p at roots modulo p for each prime p \leq B. A snippet of the sieve array for x = -5 to x = 5 might show low values (indicating potential smoothness) at positions corresponding to f(-3) = -540, f(1) = 140, and f(2) = 315, after sieving adjustments yield sums below a threshold (e.g., -10 for log-scale detection).[10] Checking these candidates via trial division against the factor base yields smooth relations, such as for x = 1 (corresponding to base a = 87): f(1) = 140 = 2^2 \cdot 5 \cdot 7; for x = -3 (base a = 83): f(-3) = -540 = -1 \cdot 2^2 \cdot 3^3 \cdot 5; and for x = 2 (base a = [88](/page/88)): f(2) = 315 = 3^2 \cdot 5 \cdot 7. These three relations involve primes 2, 3, 5, 7 (ignoring sign as extra factor).[10] To find a dependency, form the exponent matrix modulo 2 over the primes 3, 5, 7 (rows, 2 has even exponents everywhere) and the three relations (columns):| Prime | Rel1 (x=1) | Rel2 (x=-3) | Rel3 (x=2) |
|---|---|---|---|
| 3 | 0 | 1 | 0 |
| 5 | 1 | 1 | 1 |
| 7 | 1 | 0 | 1 |
Factoring Records and Applications
The Quadratic Sieve (QS) achieved one of its most notable historical successes in the factorization of the 129-digit RSA-129 challenge number in April 1994, a semiprime posed by Martin Gardner in 1977 that had stumped cryptographers for over a decade. This factorization, accomplished by a distributed team led by Derek Atkins, Michael Graff, Arjen Lenstra, and Paul Leyland, employed the multiple polynomial variant of QS and harnessed approximately 600 volunteers across the internet, consuming 4,000 to 6,000 MIPS-years of computation over eight months. The factors were 64-digit prime 3490529510847650949147849619903898133417764638493387843990820577 and 65-digit prime 32769132993266709549961988190834461413177642967992942539798288533.[12][13][4] In the years following, QS continued to set records for general-purpose factorization of semiprimes up to around 100-110 decimal digits, such as a 116-digit number factored in 1990, before being overtaken by the Number Field Sieve (NFS) for larger inputs. For instance, the 92-digit number factored in 1989 using vectorized QS implementations on supercomputers like the NEC SX-2 represented an early benchmark for optimized sieving on high-performance hardware. By the late 1990s, subsequent RSA challenges shifted to NFS dominance; however, QS remains viable for revisiting smaller records, underscoring its enduring efficiency for medium-sized integers in computational number theory.[5] Practically, QS finds applications in cryptanalysis, particularly for breaking legacy RSA keys under approximately 350 bits (about 100-110 digits) generated with outdated standards, such as those in early smart cards or protocols vulnerable to modern hardware. It serves as a benchmark tool in number theory research for testing smoothness probabilities and sieving optimizations, often in hybrid setups with methods like the Elliptic Curve Method for initial cofactor reduction in distributed projects such as NFS@Home. Educationally, QS implementations facilitate hands-on exploration of subexponential algorithms, emphasizing conceptual insights into relation collection and linear algebra over kernel construction. Despite its limitations—arising from a subexponential time complexity of \exp\left( (1 + o(1)) \sqrt{\log n \log \log n} \right), which makes it slower than NFS for inputs exceeding 110 digits—QS retains value for targeted factoring where simplicity and lower memory demands outweigh asymptotic speed.[14][3]Software Implementations
Open-Source Tools
Several open-source tools implement the quadratic sieve (QS) algorithm or its variants, providing accessible means for integer factorization in research, education, and practical applications. These implementations vary in scope, from standalone libraries focused on QS to integrated utilities that combine QS with other methods for broader functionality.[15][16][17][18] Msieve is a C library developed by Jason Papadopoulos that includes a high-performance implementation of the multiple polynomial quadratic sieve (MPQS), a variant of QS optimized for larger numbers. It supports factorization of semiprimes up to approximately 80 digits using MPQS and was last significantly updated in the early 2010s. The library's modular design allows integration into custom applications, and it has been used in significant factoring efforts.[15][19] YAFU (Yet Another Factoring Utility), developed by Brian Buhrow, is a command-line tool that integrates QS with the self-initializing quadratic sieve (SIQS) for efficient handling of medium-sized integers. Available on GitHub, YAFU features optimizations such as SSE and AVX instruction set support for sieving. Its automated workflow selects QS or SIQS based on input size, making it suitable for general-purpose factorization up to around 100 digits.[16] CQS is a standalone C implementation of the quadratic sieve, released in 2022 and emphasizing simplicity for educational purposes. Hosted on GitHub, it provides a basic SIQS variant with detailed source code and testing utilities, allowing users to explore core QS mechanics like sieving and dependency resolution without advanced dependencies. The public-domain license facilitates modifications for teaching or experimentation.[17] Other notable open-source options include Python libraries such as SymPy's nt heory module offer a basic QS implementation through theqs function within factorint, enabling straightforward factorization in scripting environments for smaller numbers. These tools' open-source licenses, typically GPL or public domain, promote customization and community contributions, extending their utility beyond default configurations.[18]