Fact-checked by Grok 2 weeks ago

Binary GCD algorithm

The greatest (GCD) algorithm, also known as Stein's algorithm or the Euclidean algorithm, is an efficient procedure for computing the GCD of two non-negative integers by leveraging their representations and performing only bit shifts, subtractions, and comparisons, thereby eliminating the need for operations inherent in the classical . Developed by physicist and programmer Josef Stein in 1961, it was first published in 1967 as part of work on computational problems in Racah algebra, a branch of involving calculations. The algorithm's design exploits key number-theoretic properties, such as the fact that if both inputs are even, their GCD is twice the GCD of the halved values; if one is even and the other odd, the GCD equals the GCD of the halved even number and the odd number; and if both are odd, the GCD equals the GCD of the halved and the smaller number. This approach yields a of O((log u + log v)^2) bit operations, making it particularly advantageous for implementations or large integers where is costly, as bit shifts correspond to efficient right-shift instructions on processors. Unlike Euclid's algorithm, which relies on repeated operations, the binary GCD reduces numbers more gradually but avoids multiplications and s entirely, often resulting in fewer operations in practice for certain input distributions. The algorithm has been extensively analyzed and generalized, including extensions to GCDs and its use in like modular inversions, underscoring its enduring relevance in and software libraries.

Background

Greatest Common Divisor

The greatest common divisor (GCD) of two nonzero integers a and b, denoted \gcd(a, b), is the largest positive integer d that divides both a and b without leaving a remainder. This concept assumes a and b are positive for simplicity, though it extends to all integers via absolute values. Key properties include commutativity, where \gcd(a, b) = \gcd(b, a), and the boundary case \gcd(a, 0) = a. Additionally, \gcd(a, b) = \gcd(b, a \mod b), which underpins efficient computation methods. To illustrate, consider \gcd(48, 18). The prime factorization of 48 is $2^4 \times 3, while 18 is $2 \times 3^2; the highest power of each common prime is $2^1 \times 3^1 = 6, confirming \gcd(48, 18) = 6. This example highlights how GCD identifies shared factors, reducing numbers to their common structure. The GCD finds broad applications across and . In , it serves as a cornerstone for analyzing divisibility, prime decompositions, and relations. In , it verifies coprimality (when \gcd(a, b) = 1) essential for secure in systems like . For fraction simplification, dividing both numerator and denominator by their GCD yields the lowest terms, as in reducing \frac{48}{18} to \frac{8}{3}. The offers a standard method for its computation.

Euclidean Algorithm

The Euclidean algorithm for computing the (GCD) of two integers originates in Euclid's Elements, a foundational mathematical composed around . In Books VII and X of the Elements, Euclid describes a method to find the GCD of two lengths by repeated subtraction, which is equivalent to the modern division-based approach. This algorithm laid the groundwork for efficient integer division problems in . The recursive formulation of the states that for two positive integers a and b with a > b > 0, \gcd(a, b) = \gcd(b, a \mod b), where a \mod b is the when a is divided by b. The recursion continues until b = 0, at which point \gcd(a, 0) = a. This principle relies on the fact that any common divisor of a and b also divides a - qb for any integer q, and the modulo operation efficiently captures this by minimizing the remainder. An iterative version avoids recursion by repeatedly updating the values: set a and b as the inputs (assuming a \geq b > 0), then while b \neq 0, replace a with b and b with a \mod b; the final non-zero remainder is the GCD. This loop-based approach uses repeated modulo operations and is suitable for direct implementation in programming. To illustrate, consider computing \gcd(1071, 462):
Step
110714622147
2462147321
31472170
Here, $1071 = 2 \times 462 + 147, $462 = 3 \times 147 + 21, and $147 = 7 \times 21 + 0, so the GCD is 21. The algorithm's advantages include its simplicity, requiring only basic arithmetic operations, and guaranteed termination, as each is strictly smaller than the previous and non-negative, ensuring the process ends in a finite number of steps bounded by the input . However, a key disadvantage is that the division operations, particularly , can be computationally costly in implementations due to their compared to simpler bit-level operations. The binary GCD algorithm addresses this limitation by substituting divisions with bitwise shifts and subtractions.

Algorithm Description

Core Principles

The binary greatest common divisor (GCD) algorithm, also known as Stein's algorithm, operates on non-negative integers represented in , leveraging the inherent properties of arithmetic to compute the GCD efficiently. A fundamental principle is the exploitation of the fact that even numbers are divisible by 2, allowing the algorithm to factor out powers of 2 from the inputs before proceeding. Specifically, if both inputs a and b share k common trailing zeros in their binary representations—indicating divisibility by $2^k—then \gcd(a, b) = 2^k \cdot \gcd(a / 2^k, b / 2^k), reducing the problem to smaller odd integers. This step ensures that subsequent operations deal primarily with odd numbers, simplifying the by isolating the binary factor of 2. The algorithm relies on several key mathematical identities that preserve the GCD while transforming the inputs using only subtraction and halving. For a < b, \gcd(a, b) = \gcd(a, b - a), which simulates the remainder operation without division. Additionally, \gcd(2a, 2b) = 2 \cdot \gcd(a, b) holds when both numbers are even, enabling recursive factoring of 2. When one number is even and the other odd, say a even and b odd, \gcd(a, b) = \gcd(a/2, b), as 2 does not divide b. These identities, grounded in the properties of divisibility and binary structure, allow the algorithm to maintain correctness while avoiding complex arithmetic. A core operational foundation is the avoidance of multiplication and division operations, which are computationally expensive in binary systems, by substituting them with bitwise right shifts for halving (equivalent to division by 2) and repeated subtractions to approximate the . This approach replaces the division step of the classical —known for its inefficiency in binary implementations—with simpler, hardware-friendly operations that align with binary representation. By focusing on parity checks (even or odd) and these transformations, the binary GCD algorithm achieves its efficiency on binary computers.

Step-by-Step Procedure

The GCD algorithm takes two non-negative s a and b. If a = 0, return b; if b = 0, return a. Otherwise, swap a and b if a < b to ensure a \geq b > 0. Next, factor out the common powers of 2 from both a and b. Count the number k of common trailing zeros in the binary representations of both numbers (i.e., k = \min(v_2(a), v_2(b)), where v_2 is the 2-adic valuation). Shift both a and b right by k bits (equivalent to division by $2^k), and remember to multiply the final result by $2^k. This leverages the property that \gcd(a, b) = 2^k \cdot \gcd(a / 2^k, b / 2^k). Now execute the core loop while b \neq 0:
  • If b is even, shift b right by 1 bit.
  • Else if a is even, shift a right by 1 bit.
  • Otherwise (both odd), if a > b, replace a with a - b; else replace b with b - a.
Each iteration reduces at least one of the numbers, progressing toward termination. Upon reaching b = 0, return a multiplied by the initial $2^k. To demonstrate, consider \gcd(48, 18). In binary, $48 = 110000_2 and $18 = 10010_2, both sharing one trailing zero (k=1). Shift both: a = 24 = 11000_2, b = 9 = 1001_2.
  • b = 9 odd, a = 24 even → shift a to $12 = 1100_2.
  • b = 9 odd, a = 12 even → shift a to $6 = 110_2.
  • b = 9 odd, a = 6 even → shift a to $3 = 11_2.
  • Both odd, a = 3 < b = 9 → b = 9 - 3 = 6 = 110_2.
  • b = 6 even → shift b to $3 = 11_2.
  • Both odd, a = 3 = b = 3 → b = 3 - 3 = 0.
Thus, return $3 \times 2^1 = 6, which is correct.

Implementation

Pseudocode

The binary GCD algorithm, originally described by Josef Stein, is typically implemented in an iterative form using bit operations for efficiency on binary hardware. The procedure first handles edge cases such as zero or negative inputs by returning the absolute value of the non-zero input or 0 for both zeros (noting that GCD(0,0) is conventionally 0 despite being mathematically undefined). It then factors out common powers of 2 from both inputs, tracking the shift count to multiply back at the end. The main loop ensures at least one input is odd, shifts out trailing zeros from the even input, and replaces the larger odd input with half the difference of the two odds (which is even), repeating until one input reaches zero. Bitwise operations like right shifts (>>) for division by 2 and bitwise AND (&1) for parity checks optimize the implementation. Here is a standard iterative pseudocode representation:
function binary_gcd(a, b):
    if a == 0:
        return abs(b)
    if b == 0:
        return abs(a)
    a = abs(a)
    b = abs(b)
    
    // Remove common factors of 2
    shift = 0
    while (a & 1) == 0 and (b & 1) == 0:
        a >>= 1
        b >>= 1
        shift += 1
    
    // Main loop: at least one is odd
    while a != 0:
        // Remove factors of 2 from a
        while (a & 1) == 0:
            a >>= 1
        // Remove factors of 2 from b
        while (b & 1) == 0:
            b >>= 1
        // Both odd: subtract and shift
        if a > b:
            a = (a - b) >> 1
        else:
            b = (b - a) >> 1
    
    return b << shift
A recursive variant exists, where the base case returns the non-zero input, common even factors are handled by recursing on halved inputs and doubling the result, even-odd cases shift the even input before recursing, and odd-odd cases recurse on the difference after ensuring the first argument is larger. However, the iterative form is primary due to its efficiency and avoidance of stack overhead, aligning with Stein's original intent for computational applications.

Programming Examples

The binary GCD algorithm lends itself to efficient implementations in various programming languages, leveraging bitwise operations for speed and portability across platforms. Implementations should handle negative inputs by taking absolute values, as GCD is defined for non-negative integers. These examples illustrate practical applications, starting with a C version using unsigned integers to prevent sign-related issues during shifts, followed by Python with integer division for shifts and modulo for parity checks, and a Java variant employing BigInteger for handling large numbers without overflow risks. Each follows the core iterative logic of removing common factors of 2 and iteratively reducing the larger odd number.

C Implementation

The C implementation below uses unsigned integers and bitwise operators such as & for parity checks, >> for right shifts (equivalent to by 2), and for reduction, ensuring operations remain within the unsigned domain.
c
#include <stdio.h>

unsigned int binary_gcd(unsigned int u, unsigned int v) {
    if (u == 0) return v;
    if (v == 0) return u;
    unsigned int shift = 0;
    while (((u | v) & 1) == 0) {
        u >>= 1;
        v >>= 1;
        shift++;
    }
    while ((u & 1) == 0) u >>= 1;
    do {
        while ((v & 1) == 0) v >>= 1;
        if (u > v) {
            unsigned int temp = u;
            u = v;
            v = temp;
        }
        v -= u;
    } while (v != 0);
    return u << shift;
}

int main() {
    unsigned int a = 48, b = 18;
    printf("GCD of %u and %u is %u\n", a, b, binary_gcd(a, b));  // Outputs: 6
    return 0;
}
This code computes binary_gcd(48, 18) == 6, demonstrating correct reduction through even factor removal and odd subtractions.

Python Example

In Python, the implementation replaces bitwise shifts with floor division (// 2) and uses modulo (% 2) to check for evenness, making it straightforward for arbitrary-precision integers. Input handling is included via user prompts for interactivity.
python
def binary_gcd(a, b):
    if a == 0:
        return abs(b)
    if b == 0:
        return abs(a)
    a = abs(a)
    b = abs(b)
    shift = 0
    while (a | b) % 2 == 0:
        a //= 2
        b //= 2
        shift += 1
    while a % 2 == 0:
        a //= 2
    while b != 0:
        while b % 2 == 0:
            b //= 2
        if a > b:
            a, b = b, a
        b -= a
    return a << shift

if __name__ == "__main__":
    try:
        a = int(input("Enter first number: "))
        b = int(input("Enter second number: "))
        print(f"GCD of {a} and {b} is {binary_gcd(a, b)}")  # Example: GCD of 48 and 18 is 6
    except ValueError:
        print("Please enter valid integers.")
Running with inputs 48 and 18 yields 6, confirming the algorithm's correctness for positive integers.

Java Variant

The Java variant utilizes BigInteger to support arbitrarily large numbers, employing methods like shiftRight(1), testBit(0) for even checks (via and(BigInteger.ONE)), and subtract for reductions, thereby avoiding overflow that could occur with primitive types like long for numbers exceeding 64 bits. This ensures reliability for cryptographic or large-scale computations.
java
import java.math.BigInteger;
import java.util.Scanner;

public class BinaryGCD {
    public static BigInteger binaryGcd(BigInteger u, BigInteger v) {
        if (u.equals(BigInteger.ZERO)) return v.abs();
        if (v.equals(BigInteger.ZERO)) return u.abs();
        u = u.abs();
        v = v.abs();
        int shift = 0;
        while (((u.or(v)).and(BigInteger.ONE)).equals(BigInteger.ZERO)) {
            u = u.shiftRight(1);
            v = v.shiftRight(1);
            shift++;
        }
        while (u.and(BigInteger.ONE).equals(BigInteger.ZERO)) {
            u = u.shiftRight(1);
        }
        do {
            while (v.and(BigInteger.ONE).equals(BigInteger.ZERO)) {
                v = v.shiftRight(1);
            }
            if (u.compareTo(v) > 0) {
                BigInteger temp = u;
                u = v;
                v = temp;
            }
            v = v.subtract(u);
        } while (!v.equals(BigInteger.ZERO));
        return u.shiftLeft(shift);
    }

    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);
        System.out.print("Enter first number: ");
        BigInteger a = scanner.nextBigInteger();
        System.out.print("Enter second number: ");
        BigInteger b = scanner.nextBigInteger();
        System.out.println("GCD of " + a + " and " + b + " is " + binaryGcd(a, b));
        scanner.close();  // Example: GCD of 48 and 18 is 6
    }
}
For inputs 48 and 18, it outputs 6, and scales seamlessly to large values like BigInteger.valueOf(Long.MAX_VALUE).multiply(BigInteger.valueOf(2)) without overflow.

Performance Notes

In , right-shift operations (>>) on unsigned integers are optimized at compile-time to single CPU instructions on most architectures, contributing to the algorithm's efficiency over division-based alternatives, with worst-case of O((log n)^2) where n is the larger input.

Analysis

Time and Space Complexity

The binary GCD algorithm exhibits a of O((\log (a b))^2) bit operations in the worst case, as the procedure requires O(\log \min(a, b)) iterations to reduce the operands, with each iteration involving bit shifts or subtractions that each cost O(\log (a b)) time. This bound is analogous to the Euclidean algorithm's complexity but leverages simpler operations. The is O(1) extra space beyond the input, achieved through in-place modifications of the operands without auxiliary structures proportional to their size. In comparison to the , the binary GCD shares a similar asymptotic of O((\log n)^2) bit operations, yet it proves faster in hardware implementations lacking dedicated division support, as it substitutes expensive divisions with efficient subtractions and right shifts. The worst-case scenario for the binary GCD mirrors that of the , occurring for inputs akin to consecutive numbers, which demand the maximum \Theta(\log n) steps. For random inputs, the average-case performance of the GCD is markedly faster than the variant, owing to the prevalence of even numbers that trigger early bit shifts, rapidly halving operands and minimizing steps. Empirical evaluations confirm this advantage, showing the GCD outperforming the for operands exceeding 8000 bits in length.

Correctness Proof

The correctness of the binary GCD algorithm rests on two main components: the preservation of the (GCD) as an through each step of the procedure, and the guarantee of termination with the correct base case result. The proof proceeds by on the number of iterations, leveraging key number-theoretic identities that maintain the GCD while reducing the input sizes. Prior to entering the main loop, the algorithm extracts the highest common power of 2 dividing both inputs a and b, denoted $2^k where k = \min(\nu_2(a), \nu_2(b)) and \nu_2(n) is the 2-adic valuation of n (the exponent of 2 in the prime of n). It then computes the GCD of the resulting odd s u = a / 2^{\nu_2(a)} and v = b / 2^{\nu_2(b)}, and multiplies the result by $2^k. This step is justified by the , which states that every positive greater than 1 has a prime (up to ordering of factors). Consequently, the exponent of 2 in \gcd(a, b) is precisely k, and the GCD of the odd parts yields the remaining odd prime factors common to a and b. Assuming without loss of generality that the inputs to the main loop are positive odd integers u and v with u \geq v > 0, the algorithm maintains the invariant \gcd(u, v) = \gcd(a, b) (adjusted for the initial power of 2) across iterations. This preservation relies on the following identities, applied based on the parities of u and v:
  • If u is even and v is odd, then \gcd(u/2, v) = \gcd(u, v). Any common divisor d of u and v must divide v (which is odd, so d is odd) and thus divides u/2, while conversely, any common divisor of u/2 and v divides $2 \cdot (u/2) = u and v.
  • If both u and v are odd and u \geq v, then \gcd((u - v)/2, v) = \gcd(u, v). First, \gcd(u - v, v) = \gcd(u, v) since any common divisor of u - v and v divides u = (u - v) + v and vice versa. Moreover, u - v is even (difference of odds), but v is odd, so 2 does not divide v; thus, the common divisors (all odd) divide (u - v)/2, and dividing by 2 removes no shared prime factors. The case u < v is symmetric by swapping.
For the inductive proof, consider the base case where v = 0: the algorithm returns u, and \gcd(u, 0) = u = \gcd(a, b) by . For the inductive step, assume the holds after t iterations, so \gcd(u_t, v_t) = \gcd(a, b). After the (t+1)-th iteration, one of the above identities applies, yielding new arguments u_{t+1}, v_{t+1} with \gcd(u_{t+1}, v_{t+1}) = \gcd(u_t, v_t) = \gcd(a, b). Thus, by , the holds throughout. Upon termination (when the second argument reaches 0), the first argument is the GCD. Termination is ensured because each strictly decreases the product u \cdot v by at least a factor of 2: shifting divides one argument by 2, while followed by shifting reduces the larger argument to at most half its previous value (since u - v < u and then divided by 2). As u and v are positive integers bounded below by 1, the process cannot continue indefinitely and must reach v = 0 in O(\log(uv)) steps.

Extensions

Multi-Number Variants

The binary (GCD) can be extended to compute the GCD of more than two nonnegative integers by leveraging the of the GCD operation, applying the pairwise iteratively to reduce the problem successively. For a set of integers a_1, a_2, \dots, a_n, the GCD is obtained by computing \gcd(a_1, a_2, \dots, a_n) = \gcd(\gcd(a_1, a_2), a_3, \dots, a_n), where each intermediate GCD uses the binary method's shift and operations to maintain efficiency. This approach avoids the need for , preserving the 's advantages in and software implementations for large integers. An optimization for multiple inputs involves first identifying and factoring out the highest common power of 2, denoted as $2^k, that divides all input numbers, then applying the iterative GCD to the resulting quotients, and finally multiplying the result by $2^k. This global tracking of the common 2-adic valuation reduces redundant shifts across pairwise computations, improving performance especially when many inputs share low powers of 2. Such extensions build on the core principles of Stein's algorithm while adapting to the multi-argument case. For example, consider computing the GCD of , 18, and . Iteratively, \gcd(48, 18) = 6 (since both are even, shift to 24 and 9, then subtract and shift to reach 3, multiply back by 2), and \gcd(6, 30) = 6. With the power-of-2 optimization, the minimum valuation is k=1 (48 divisible by 16, but 18 and 30 by only 2), so divide all by 2 to get 24, 9, 15; the GCD of these quotients is 3 (via \gcd(24, 9) = 3, \gcd(3, 15) = 3); multiply by $2^1 to yield 6. A 2024 extension introduces a tree-based batch GCD for computing all-pairs GCDs efficiently among many large integers, useful for identifying weak keys with shared factors in . In , multi-number variants of the GCD are applied during key generation with multiple primes, where pairwise coprimality must be verified efficiently among large prime factors to ensure the modulus is square-free. The 's speed with big integers makes it suitable for computing modular inverses and GCD checks in this process, as seen in optimized implementations for secure key pair creation.

Optimized Implementations

Software optimizations for the binary GCD algorithm primarily focus on exploiting hardware-supported operations for , which are central to its shift and subtraction steps. A key enhancement involves using intrinsics such as GCC's __builtin_ctz to trailing zeros in integers, enabling efficient removal of factors of 2 without iterative loops. This replaces slower manual counting with a single CPU instruction, significantly reducing execution time in the initialization and loop phases. For instance, implementations incorporating __builtin_ctz achieve up to 55% higher throughput compared to unoptimized versions on processors. Further software refinements utilize intrinsics for operations, particularly in extended GCD variants used for modular inversion. Techniques like mutualizing updates to values (u and v) over multiple iterations and approximating operands with fixed-bit registers minimize and leverage fast opcodes. Inline with x86 extensions such as BMI2 for population count and ADX for wide ensures constant-time execution, reducing cycles per iteration to 6-9 on modern CPUs for 256-bit moduli. These optimizations are implemented in libraries like BearSSL, prioritizing security and performance in cryptographic contexts. In hardware adaptations, the binary GCD benefits from dedicated bit-shift units in FPGA and ASIC designs, which accelerate the right-shift operations inherent to . FPGA implementations often replace traditional shift registers with counters to optimize area and ; for example, an extended binary GCD modular divider for on a Virtex-II FPGA utilizes counters and a modular correction , yielding a 62% speed increase at 58.8 MHz while using only 20% of the device resources. Recent scalable FPGA accelerators, such as FELIX for large-integer extended GCD, employ pipelined systolic arrays with custom shift logic, achieving high throughput for applications on up to 4096-bit operands. ASIC designs similarly adopt Stein's subtraction-based approach over division-heavy variants, resulting in lower gate counts and faster evaluation for verifiable delay functions. For multi-precision integers represented as arrays of limbs (e.g., 64-bit words), variants apply the GCD to multiple independent computations concurrently across cores or GPU threads, leveraging libraries like GMP for limb-level operations. This scales well on multi-core systems and GPUs, with performance analyses showing efficient GCD computation for 1024-bit numbers using methods in environments. Such variants are particularly effective in for requiring big-integer arithmetic. Benchmarks on 64-bit integers demonstrate that optimized GCD implementations outperform the standard by 20-50% in low-end devices, where division is costly compared to shifts and subtractions. On modern embedded systems and microcontrollers lacking hardware division support, the advantage is more pronounced, with variants achieving up to 2-3 times the throughput for random inputs. For example, on Ice Lake processors, a basic GCD processes 7.7 million operations per second versus 7.2 million for std::gcd, while optimizations extend this to 12 million; similar trends hold on ARM-based low-power devices. These gains establish the GCD's suitability for resource-constrained environments like and mobile as of 2024-2025 evaluations.

History

Development by Stein

Josef , a affiliated with the , introduced the binary (GCD) algorithm in 1967 as an improvement over the classical , which relies on division operations that were computationally expensive on early binary computers. Although the algorithm in its modern form was published by , similar methods were known in ancient , including in texts from the 2nd century BCE and possibly in 1st-century . Stein's approach leveraged the binary representation of integers to replace divisions with more efficient subtractions and right shifts, aligning with the hardware capabilities of the era where bit-level operations were faster and division hardware was limited or absent. This motivation stemmed from the need for hardware-efficient computation in numerical algorithms, particularly in contexts like Racah algebra computations where GCD operations were frequent. The algorithm was detailed in Stein's paper titled "Computational problems associated with Racah algebra," published in the Journal of Computational Physics (volume 1, issue 3, pages 397–405). In this work, Stein described a procedure that computes the GCD of two positive integers by repeatedly applying rules based on : if both numbers are even, factor out powers of 2; if one is even, divide by 2; otherwise, subtract the smaller from the larger. This method avoids the operation central to the , reducing the computational overhead in binary arithmetic environments. Stein's background in informed his focus on practical computational efficiency. Influenced by emerging trends in binary computing during the , he tailored the algorithm specifically for positive integers, emphasizing its simplicity and speed for machine implementation without generalizing to negative numbers or zero in the initial formulation. The paper highlighted these operations as a "direct method," underscoring its straightforward implementation using basic arithmetic primitives available in contemporary hardware.

Adoption and Impact

The binary GCD algorithm received early academic validation through its description in Knuth's (first edition, 1969), where it appears as Algorithm B in section 4.5.2 for efficient integer computation. This inclusion helped disseminate the method among computer scientists and algorithm designers during the late and . By the early 1990s, the algorithm was integrated into practical software libraries for multiprecision arithmetic. The GNU Multiple Precision Arithmetic Library (GMP), first released in 1991 by Torbjörn Granlund, adopted a binary-style GCD for operands below a certain threshold, leveraging its shift and operations to optimize performance on hardware with costly division. This implementation in GMP facilitated its use in numerical computing applications, including early cryptographic tools and scientific simulations. In modern software ecosystems, the binary GCD algorithm remains a staple in cryptographic libraries due to its efficiency in modular operations. , a widely used open-source toolkit for secure communications, employs Stein's for GCD calculations in routines like modular inversion, enhancing performance in resource-constrained environments such as TLS handshakes. Its adoption underscores the algorithm's role in reducing timing vulnerabilities and computational overhead in production cryptography. The algorithm's impact extends to embedded systems, where division instructions are often slow or power-intensive; here, the binary GCD's reliance on bitwise shifts and subtractions yields measurable speedups over classical Euclidean variants, sometimes by factors of 2–8x on older processors. Hardware implementations, such as accelerators in Altera's Nios II soft-core processors, further amplify this benefit for real-time applications like signal processing and control systems. As of 2025, the binary GCD has been referenced in hundreds of research papers on arithmetic algorithms and cryptography, influencing optimizations in libraries and secure protocol designs.

References

  1. [1]
    binary GCD
    If u and v are both even, gcd(u,v) = 2 gcd(u/2, v/2). · If u is even and v is odd, gcd(u,v) = gcd(u/2, v). · Otherwise both are odd, and gcd(u,v) = gcd(|u-v|/2, v) ...
  2. [2]
    [PDF] Greatest Common Measure: the last 2500 Years - Stepanov Papers
    Josef Stein (1961): gcd(n, 0) = gcd(0, n) = n gcd(n, n) = n gcd(2n, 2m) ... as an Analogue of the Binary GCD Algorithm, J. Symbolic Computation (2000) ...
  3. [3]
    [PDF] Optimized Binary GCD for Modular Inversion 1 Introduction
    Aug 23, 2020 · The binary GCD is a variant of Euclid's algorithm that performs only comparisons, subtractions and divisions by 2 (i.e. right shifts), and is ...
  4. [4]
  5. [5]
    [PDF] Modular Arithmetic and Elementary Algebra 1 Euclid's Algorithm
    Apr 7, 2015 · Secondly gcd(a,0) = a by definition. Thirdly and most importantly, if a = zb + c where z is an integer then gcd(a, b) = gcd(b, c). Indeed ...
  6. [6]
    [PDF] Lecture 5: Finite Fields (PART 2) - PART 2: Modular Arithmetic ...
    Jan 29, 2025 · – Assuming without loss of generality that a is larger than b, it can be shown that (See Section 5.4.3 for proof) gcd( a, b) = gcd( b, a mod b ).
  7. [7]
  8. [8]
    [PDF] The Art of Computing the Greatest Common Divisor
    Aug 18, 2020 · It is used in elementary mathematics to simplify fractions, and also in basic number theory. Beyond abstract mathematics, it is used to find ...Missing: cryptography | Show results with:cryptography<|control11|><|separator|>
  9. [9]
    [PDF] Number Theory and Cryptography - Computer Science
    The Euclidian algorithm is an efficient method for computing the greatest common divisor of two integers. It is based on the idea that gcd(a,b) is equal to gcd( ...Missing: simplification | Show results with:simplification
  10. [10]
    [PDF] Greatest Common Divisor: Algorithm and Proof
    Aug 9, 2019 · Finding the greatest common divisor of two positive integers is not just needed for simplifying fractions. In fact, the concept of the greatest ...
  11. [11]
    [PDF] Divisibility and greatest common divisor - Keith Conrad
    It uses successive division with remainder in such a way that the remainder keeps dropping and the last nonzero remainder is the greatest common divisor. This ...
  12. [12]
    [PDF] Euclid's Algorithm for the Greatest Common Divisor
    Here we present the translations of (relevant) Definitions, Proposition 1 and Proposition 2 from Book VII of Euclid's Elements as translated by Sir. Thomas L.
  13. [13]
    [PDF] The Euclidean Algorithm by Deniz Gurel - Computer Science
    Dec 5, 2014 · In Euclid's example the remainder of AE/FC is zero, thus GCD(AB,CD). FC. Below is an application of the algorithm using 1071 and 462. མའ. 1071.Missing: source | Show results with:source
  14. [14]
    [PDF] A Fast Large-Integer Extended GCD Algorithm and Hardware Design
    Aug 31, 2022 · Since Euclid-based algorithms have low iteration counts but require expensive divisions, prior work improving Euclid has focused on optimizing ...
  15. [15]
    Computational problems associated with Racah algebra
    Computational problems associated with Racah algebra☆. Author links open overlay panel. J Stein. Show more. Add to Mendeley. Share. Cite.
  16. [16]
    [PDF] The greatest common divisor - University of Waterloo
    Algorithmic Number Theory. Page 23. The Binary gcd Algorithm. J. Stein discovered a \non-Euclidean" algorithm for computing the gcd (1967). It uses the follow ...Missing: core principles<|control11|><|separator|>
  17. [17]
  18. [18]
    [PDF] Algorithm Basics
    Binary Euclid's Algorithm to compute gcd g = 1. /* Remove powers of two from the gcd */ while (a mod 2 = 0) and (b mod 2 = 0) do a = a/2 b = b/2 g = 2 · g.
  19. [19]
    Stein's Algorithm for finding GCD - GeeksforGeeks
    Feb 12, 2025 · Stein's algorithm or binary GCD algorithm is an algorithm that computes the greatest common divisor of two non-negative integers.
  20. [20]
    Finding Greatest Common Divisor in Java | Baeldung
    Feb 14, 2025 · Finally, we can use Stein's algorithm, also known as the Binary GCD algorithm, to find the GCD of two non-negative integers. This algorithm uses ...
  21. [21]
    [PDF] An Efficient All-to-All GCD Algorithm for Low Entropy RSA Key ...
    May 30, 2024 · We will assume that the binary GCD(a, b) operation has ... The time complexity of the product tree construction of step 1 is O ...
  22. [22]
    [PDF] Extending Stein's GCD algorithm
    Stein's binary GCD algorithm [Stein1967] is also a very widely known and used algorithm, so a proof of correctness will not be provided, nor an analysis of the.<|control11|><|separator|>
  23. [23]
    [PDF] From Euclid's GCD to Montgomery Multiplication to the Great Divide
    A binary greatest common divisor algorithm was devised by Stein in 1961 [14]. ... This paper presents an efficient algorithm for computing field divisions ...
  24. [24]
    [PDF] The Fundamental Theorem of Arithmetic
    Jun 14, 2008 · The Fundamental Theorem of Arithmetic says that every integer greater than 1 can be factored uniquely into a product of primes. Euclid's lemma ...
  25. [25]
    [PDF] arXiv:1407.6794v1 [cs.DS] 25 Jul 2014
    Jul 25, 2014 · We extend the Euclid's algorithm and binary GCD algorithm to compute the GCD of more than ... [4] Knuth, D.E., “The art of computer programming, ...Missing: procedure | Show results with:procedure
  26. [26]
    Fastest way to compute the greatest common divisor
    Dec 26, 2013 · The binary GCD algorithm is a fast way to compute GCD, especially for 32-bit integers, but a naive algorithm can be faster if one integer is 1.
  27. [27]
    [PDF] Hardware implementation of elliptic curve Diffie-Hellman key ...
    The Binary GCD algorithm calculates GCD of two non-negative integers and has an advantage over the clas- sic Euclidean algorithm as it replaces the divisions ...
  28. [28]
    [PDF] FELIX (XGCD for FALCON) - Cryptology ePrint Archive - IACR
    FELIX is a novel FPGA-based, scalable, and lightweight accelerator for large integer Extended GCD (XGCD) computation, which is critical in crypto.
  29. [29]
    [PDF] Multiple Precision Integer GCD Performance Analysis on Parallel ...
    Nov 21, 2015 · In this paper, we have proposed the efficient parallel algorithm for the computation of GCD based on Extended. Euclidean GCD, Binary GCD, ...
  30. [30]
    [PDF] Parallel Implementation of the Accelerated Integer GCD Algorithm
    The accelerated integer greatest common divisor (GCD) algorithm has been shown to be one of the most efficient in practice. This paper describes a parallel ...Missing: conquer | Show results with:conquer
  31. [31]
    Greatest common divisor, the extended Euclidean algorithm, and ...
    Apr 13, 2024 · The binary Euclidean algorithm is typically faster than the textbook Euclidean algorithm ... binary GCD algorithm might require many steps if u is ...Missing: comparison | Show results with:comparison
  32. [32]
    Binary GCD Algorithm vs. Euclid's Algorithm on modern computers
    Nov 19, 2011 · On machines with slow division, binary GCD tends to outperform the Euclidean algorithm. I benchmarked it a couple of years ago on a Pentium4 in C, Java and a ...
  33. [33]
    The Mathematical-Function Computation Handbook: Programming ...
    ... Josef Stein is apparently the first to publish a binary variant of the ... University of New Mexico. He made important contributions to the EISPACK and ...
  34. [34]
    [PDF] Elements of Programming - Semantic Scholar
    Jan 9, 2008 · In 1961, Josef Stein discovered the following gcd algorithm 10: template <typename T> requires(Integer(T)). T binary_gcd_nonnegative(T a, T b) ...
  35. [35]
    [PDF] Modern Computer Arithmetic - Mathematical Sciences Institute, ANU
    1.6.3 Half Binary GCD, Divide and Conquer GCD . . . . . . 36. 1.7 Base ... [210] Josef Stein. Computational problems associated with Racah algebra ...
  36. [36]
    Binary GCD (GNU MP 6.3.0)
    Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth section 4.5.3 Theorem E. When the implied quotient is large, meaning b is much smaller ...
  37. [37]
    [PDF] Arithmetic Software Libraries - Victor Shoup
    Shortly thereafter, Torbjörn Granlund released (in 1991) the first version of GMP (the GNU MP library) [10]. ... Both libraries use a fast “Half GCD” algorithm.
  38. [38]
    Full Key Recovery on RSA from Noisy Binary GCD Operation ...
    Some versions of OpenSSL use the binary GCD algorithm to compute the greatest common divisor. This algorithm computes the greatest common divisor by repeatedly ...Missing: usage | Show results with:usage
  39. [39]
    Bernstein-Yang's Inverse | a41-labs
    Apr 25, 2025 · For example, an implementation of Stein's binary GCD algorithm in OpenSSL was shown to be susceptible to an RSA key extraction attack. This ...Missing: usage | Show results with:usage
  40. [40]
    Embedded SoPC Design with Nios II Processor and Verilog Examples
    The binary GCD algorithm is a method to obtain the GCD. In this chapter, we implement the algorithm using a software routine and a hardware accelerator and ...