RANDU
RANDU is a multiplicative linear congruential pseudorandom number generator developed by IBM in 1968 as part of its Scientific Subroutine Library for the System/360 mainframe computers.[1] It generates a sequence of pseudorandom integers using the recurrence relation X_{n+1} = (65539 \times X_n) \mod 2^{31}, where the initial seed X_0 is an odd integer between 1 and $2^{31} - 1, and the output is typically scaled to produce uniform variates in the interval [0, 1) by dividing by $2^{31}.[2] The choice of parameters was optimized for efficient computation on early hardware, leveraging binary shifts and additions to perform the multiplication quickly.[3]
Despite its initial widespread adoption in scientific computing during the 1960s and 1970s—due to its speed and inclusion in IBM's standard libraries—RANDU is now regarded as one of the most flawed pseudorandom number generators ever designed.[4] Its primary defects stem from poor statistical properties, particularly in multidimensional applications; for instance, any three consecutive outputs lie on one of only 15 parallel hyperplanes in three-dimensional space, resulting in highly correlated and non-uniform distributions that can invalidate Monte Carlo simulations and other numerical experiments.[3] This lattice structure arises from the multiplier 65539, which factors algebraically and leads to a short effective period in higher dimensions, as well as dependencies like X_{n+2} = 6X_{n+1} - 9X_n \mod 2^{31}.[2] Although it achieves a period of $2^{29} for odd seeds,[5] these issues prompted its deprecation by the 1980s, with modern systems favoring generators like the Mersenne Twister or well-chosen LCGs with superior spectral properties.[4]
Overview and History
Definition and Purpose
RANDU is a linear congruential generator (LCG), a type of pseudorandom number generator designed to produce sequences of numbers that approximate uniform randomness in the interval [0,1).[1] LCGs form a foundational family of algorithms for pseudorandom number generation, with later variants such as the Park-Miller generator offering refinements for improved statistical properties.[6]
Developed by IBM in the late 1960s for inclusion in their Scientific Subroutine Package (SSP) on mainframe systems, RANDU generates 31-bit integers that are subsequently normalized to floating-point values in [0,1) through division by the modulus $2^{31}.[7] This design prioritized computational efficiency on the resource-constrained hardware of the era, enabling straightforward implementation via integer arithmetic operations like shifts and additions.[1]
The primary purpose of RANDU was to supply deterministic yet seemingly random sequences for computational tasks in scientific applications, particularly simulations, Monte Carlo methods, and statistical modeling where true randomness is impractical or unnecessary.[6] It gained widespread adoption in early scientific computing environments, facilitating reproducible results in numerical experiments on IBM systems.[1]
Development and Adoption
RANDU was developed by IBM in the late 1960s as a linear congruential pseudorandom number generator for inclusion in the Scientific Subroutine Package (SSP), a library of mathematical and statistical routines designed for scientific computing on the newly introduced System/360 mainframe computers.[8] The generator was specifically tailored for the binary architecture of these systems, emphasizing computational efficiency through simple bit-shift operations.[9] Its formal documentation appeared in Version III of the SSP Programmer's Manual, published in 1968, marking its official integration into IBM's software ecosystem.[8]
As the default random number generator in IBM's Fortran libraries, RANDU quickly became the standard choice for pseudorandom number generation on System/360 mainframes, which dominated the high-end computing market from the mid-1960s onward.[8] This dominance facilitated its widespread adoption across universities, research institutions, and industrial applications reliant on IBM hardware, where it powered simulations in fields such as physics, engineering, and statistics.[9] The generator's simplicity and speed made it particularly appealing in an era when computational resources were limited and alternative high-quality generators were scarce.
Despite early identifications of its statistical shortcomings in the late 1960s, RANDU persisted in use throughout the 1970s, largely due to entrenched compatibility in existing codebases and the absence of readily available superior replacements for legacy systems.[8] By the late 1970s and into the 1980s, it was gradually supplanted in IBM libraries and broader software environments as more robust generators, such as those proposed by Lewis et al. in 1969, gained traction.[9]
Core Algorithm
RANDU is a multiplicative linear congruential generator (LCG), a class of pseudorandom number generators that produce sequences through iterative linear transformations.[10]
The core recurrence relation for generating the sequence of integers X_n is given by
X_{n+1} = (a \cdot X_n) \mod m,
where a is the multiplier and m is the modulus.[1] To obtain pseudorandom numbers uniformly distributed in the interval [0, 1), each integer X_n is normalized as
U_n = \frac{X_n}{m}.
[10]
The initial seed X_0 must be an odd integer in the range from 1 to $2^{31} - 1, ensuring the sequence does not degenerate to zero or exhibit short periods from even starting values.[10][1]
The generation process proceeds as follows: begin with the odd seed X_0; compute the next integer by multiplying X_n by a and applying the modulo operation with m to yield X_{n+1}, which is a 31-bit integer; repeat iteratively for subsequent terms.[1] This multiplication is implemented efficiently using bit shifts to avoid arithmetic overflow in 32-bit systems of the era, followed by extraction of the low-order 31 bits for the modulo result.[1] Finally, normalize the resulting X_n to a floating-point value U_n in [0, 1) by division with m.[10] The use of 31-bit integers for X_n accommodates the computational constraints of contemporaneous 32-bit hardware while producing the sequence.[1]
Key Parameters
RANDU employs a modulus m = 2^{31} = 2147483648, selected to align with the 32-bit word sizes prevalent in IBM mainframe hardware of the era, facilitating efficient storage and computation while offering potential for a long period up to m.[1] The multiplier a = 65539 = 2^{16} + 3 was chosen for its simplicity in implementation, as multiplication by this value could be performed via a left shift by 16 bits followed by addition of three times the original value, optimizing performance on binary computers without requiring full multiplication operations.[3] This multiplier is odd and thus coprime with the power-of-two modulus, ensuring the generator's iterates do not immediately cycle to zero and aiming to promote adequate mixing of values across the state space, though higher-order correlations were not anticipated at the time.[1] Overall, these parameters enabled straightforward integration into the core LCG formula X_{n+1} = (a X_n) \mod m on 1960s hardware but ultimately contributed to unintended statistical dependencies in generated sequences.[10]
Statistical Properties
Period Length
For linear congruential generators (LCGs) with a power-of-two modulus m = 2^k and zero increment (c = 0, the multiplicative case), the Hull-Dobell theorem implies that a full period of m is impossible, but a maximal period of m/4 = 2^{k-2} can be achieved under specific conditions on the multiplier a. These conditions require a to be odd and congruent to either 3 or 5 modulo 8, ensuring the generator's order in the multiplicative group of residues modulo m reaches the theoretical maximum.[11]
In the case of RANDU, where m = 2^{31} and k = 31, this maximal period is $2^{29} = 536870912. RANDU attains this full period of $2^{29} when the initial seed is odd, as its multiplier a = 65539 satisfies the necessary conditions: it is odd, and a \equiv 3 \pmod{8} (noting that a - 1 = 65538 is divisible by 2 and a + 1 = 65540 is divisible by 4).[13]
However, the period is effectively shorter in practice due to the binary structure of the modulus, as the sequence is confined to odd residues modulo $2^{31} and exhibits rapid cycling in lower-order bits (e.g., the least significant bits repeat with periods as short as $2^2 or $2^3).[14] This maximal cycle length provides adequate independence for one-dimensional uniform sampling but proves insufficient for multidimensional simulations, where longer sequences without correlations are required to avoid detectable patterns.[15]
Low-Dimensional Behavior
In one dimension, RANDU exhibits acceptable uniformity, passing basic chi-squared goodness-of-fit tests for large sample sizes and thus appearing random in single sequences.[16] Due to its structure with a power-of-two modulus, the generator's binary nature produces patterns such as consistent parity—all values are odd when starting from an odd seed—but normalization to the unit interval [0,1] largely masks these in aggregated distributions.
In two dimensions, projections of RANDU sequences reveal minor lattice artifacts from the underlying linear structure, yet these are generally acceptable for simple plotting and visualization tasks.[17] The spectral test in 2D provides a moderate figure of merit, approximately 0.5 when normalized, signifying decent point spacing but suboptimal uniformity compared to modern generators.[18]
Flaws and Criticisms
Multiplier and Modulus Issues
The multiplier a = 65539 in RANDU was selected primarily for its hardware-friendly form, a = 2^{16} + 3, which enables efficient computation through bit shifts and additions on early 32-bit systems. This convenience, however, results in a critical mathematical flaw: (a - 3)^2 = 2^{32} \equiv 0 \pmod{2^{31}}, or equivalently, a^2 - 6a + 9 \equiv 0 \pmod{2^{31}}. As a consequence, consecutive terms in the sequence satisfy the linear relation X_{n+2} \equiv 6X_{n+1} - 9X_n \pmod{2^{31}}, introducing short-range dependencies every three steps that confine triples of successive values to at most 15 parallel hyperplanes in three-dimensional space.
The modulus m = 2^{31}, a power of 2, compounds these multiplier-induced issues by fostering inherent binary correlations within the output sequence. Such moduli cause linear congruential generators to exhibit predictable patterns in the binary digits of the generated numbers, as the modular arithmetic preserves low-order bit dependencies and restricts the possible values in lower significance positions.[19] This amplifies the structural defects from the multiplier, leading to lattice-like artifacts that degrade uniformity across multiple dimensions.
RANDU's parameters violate the conditions of the Hull-Dobell theorem for achieving a full-period linear congruential generator in practice, particularly for higher-dimensional projections. The theorem requires, among other criteria, that the increment c (here 0) and modulus m satisfy coprimality, and that the multiplier a generates the full multiplicative group modulo m; with m a power of 2 and c = 0, these fail, resulting in degenerate subgroups that do not span the full state space even in low dimensions.
Furthermore, the multiplier was not evaluated against established criteria for effective linear congruential generators, such as those outlined by Knuth, which emphasize full-period behavior in successive dimensions to ensure equitable coverage of the unit hypercube. RANDU's design prioritizes speed over these rigorous tests, yielding sequences with poor spectral properties and inadequate randomness for multidimensional applications.
Higher-Dimensional Correlations
One of the most pronounced flaws in RANDU manifests in three dimensions, where consecutive triples of normalized values (U_n, U_{n+1}, U_{n+2}) lie on at most 15 parallel planes within the unit cube, severely undermining the assumption of independence and uniformity. This plane degeneracy stems from the algebraic properties of the multiplier a = 65539, which satisfy a quadratic relation leading to a linear dependence among the coordinates, such as U_{n+2} - 6 U_{n+1} + 9 U_n \equiv 0 \pmod{1}, with the integer values of this combination limited to 15 possible levels (ranging effectively from -5 to 9). George Marsaglia's seminal 1968 analysis revealed this structure, showing that virtually all (over 99%) of the generated points cluster on these few planes, creating vast empty regions and rendering the sequence unsuitable for multidimensional applications like Monte Carlo integration.[20]
In higher dimensions d > 3, the correlations worsen dramatically, as sequences collapse onto low-dimensional sublattices rather than filling the unit hypercube uniformly, violating the independence required for reliable statistical modeling. Marsaglia demonstrated that for multiplicative congruential generators like RANDU, the number of containing hyperplanes grows much slower than the volume of the space—bounded above by (d! \cdot m)^{1/d}, where m = 2^{31}—resulting in, for example, fewer than 41 hyperplanes for d=10, concentrating points in a crystalline lattice that fails to approximate true randomness.[20]
These structural defects cause standard chi-squared tests for uniformity to reject the null hypothesis in three dimensions, as the observed frequencies deviate sharply from expected uniform distribution due to the overconcentration on planes.[20] This non-randomness not only invalidates higher-dimensional empirical tests but also compromises the reliability of Monte Carlo simulations, where such artifacts can propagate errors in physics, finance, and engineering computations historically reliant on RANDU.[20]
Examples and Demonstrations
Sample Sequences
To illustrate the output of the RANDU generator, consider a sequence starting with an initial seed of 12345. The terms, normalized to the unit interval, can be computed using the multiplicative congruential formula with multiplier 65539 and modulus $2^{31}. These values appear scattered and uniform when viewed individually in one dimension.
In higher dimensions, patterns emerge in consecutive terms. For example, consider the triple (x, y, z) = (0.1, 0.1, 0.7), where z ≈ 6x - 9y (mod 1), demonstrating the linear dependency characteristic of RANDU outputs in three dimensions. Similar relations hold for other triples generated by the algorithm, such as (0.2, 0.1, 0.3) satisfying the same equation modulo 1.
While the one-dimensional sequence may seem random at a glance, inspecting pairs or triples reveals non-random correlations, such as alignment on specific planes.
The following Python code snippet can be used to reproduce such sequences (non-executable illustration; requires a Python environment for execution):
python
m = 2**31 # Modulus
a = 65539 # Multiplier
seed = 12345 # Initial seed
def randu(seed, n):
sequence = []
x = seed
for _ in range(n):
x = (a * x) % m
u = x / m
sequence.append(round(u, 6))
return sequence
# Generate first 10 terms
print(randu(seed, 10))
m = 2**31 # Modulus
a = 65539 # Multiplier
seed = 12345 # Initial seed
def randu(seed, n):
sequence = []
x = seed
for _ in range(n):
x = (a * x) % m
u = x / m
sequence.append(round(u, 6))
return sequence
# Generate first 10 terms
print(randu(seed, 10))
This code implements the core RANDU recurrence and normalizes the results.
Visual Representations
One of the most striking visual demonstrations of RANDU's deficiencies involves 3D scatter plots of consecutive triples generated by the algorithm, where points (X_i, X_{i+1}, X_{i+2}) are plotted within the unit cube [0,1]^3. For sequences of approximately 1000 points, these plots reveal that all points cluster onto exactly 15 distinct parallel planes, rather than distributing uniformly throughout the volume, highlighting severe correlations in three dimensions. This phenomenon was first graphically illustrated by Marsaglia, who used such plots to expose how RANDU's output falls predominantly in low-dimensional subspaces.[21][22]
When the 3D scatter plot is rotated to align with the planes' orientation, the structure manifests as alternating dense bands resembling "zebra stripes," underscoring the generator's failure to fill space isotropically. Modern computational recreations, such as those implemented in R using the built-in randu dataset or in Julia, reproduce this striped pattern faithfully, confirming the persistence of the flaw across implementations.[23][24]
In 2D projections of these triples—such as marginal scatter plots of (X_i, X_{i+1}) or histograms of individual components—RANDU appears more benign, with histograms displaying approximate uniformity across bins, yet subtle lattice-like alignments emerge in the scatter plots, foreshadowing the higher-dimensional issues. These projections illustrate how RANDU can pass basic one- and two-dimensional uniformity tests while concealing deeper structural biases.[15]
The spectral test provides another graphical lens on RANDU's lattice structure, plotting the successive minima (d_k) of the generator's point set in k dimensions; for RANDU in three dimensions, the plot shows unusually large gaps and poor spacing between lattice points compared to well-designed generators, quantifying the visual clustering observed in scatter plots. This test's results, visualized as decreasing d_k values with prominent discontinuities, emphasize RANDU's inadequacy for multidimensional applications.[25]
Legacy and Impact
Historical Usage in Software
RANDU was prominently integrated into IBM's computing ecosystem during the 1960s and 1970s, serving as a core component of the Scientific Subroutine Package (SSP) for the System/360 mainframes and appearing as a standard subroutine in FORTRAN IV libraries for these systems.[26][27] This implementation made it readily accessible for scientific computing tasks, where its simplicity and speed on 32-bit architecture facilitated rapid adoption in early simulation programs.[27] Additionally, RANDU formed the basis for the RNUN subroutine in the International Mathematical and Statistical Libraries (IMSL) package, a widely distributed tool for numerical computations until at least the mid-1970s.[10][4]
In academic and research environments, RANDU saw extensive use in physics simulations, particularly Monte Carlo methods for modeling phenomena such as particle interactions and statistical mechanics, where its generator was embedded in custom FORTRAN codes and early statistical software packages.[28][29] These applications leveraged RANDU's availability in IBM environments, but its three-dimensional correlations later revealed biases in such simulations, as exemplified in Monte Carlo studies of the 3D Ising model where points clustered on hyperplanes, distorting spatial distributions.[28] The generator persisted in legacy codebases through the 1980s, often in unmodified research programs reliant on older IBM-compatible systems.[28]
A notable remnant of RANDU's legacy appears in the R programming language's built-in "randu" dataset, which consists of 400 triples of numbers generated by the VAX FORTRAN implementation of RANDU under VMS 1.5, specifically curated to teach the pitfalls of flawed pseudorandom number generators.[30] This dataset, sourced from sequences spaced five numbers apart and rounded to six decimal places, highlights RANDU's planar artifacts in multivariate outputs, serving as an educational tool in statistics and computing courses.[30]
Influence on RNG Design
The flaws of RANDU, particularly its tendency for generated points to lie on a limited number of hyperplanes in higher dimensions, were first systematically exposed by George Marsaglia in 1968, who demonstrated that such linear congruential generators (LCGs) exhibit severe correlations unsuitable for multidimensional simulations.[21] This critique was reinforced by Donald Knuth in 1969, who developed the spectral test to quantify lattice structure defects in LCGs, revealing RANDU's poor performance with only 15 hyperplanes in three dimensions.[9] These analyses prompted immediate efforts to refine LCG parameters, emphasizing the need for multipliers that maximize the period and minimize correlations.
A direct outcome was the adoption of superior multipliers, such as 16807 paired with a prime modulus of $2^{31} - 1, as proposed by Lewis, Goodman, and Miller in 1969 and later endorsed as the "minimal standard" by Park and Miller in 1988, which passed rigorous spectral and statistical tests while avoiding RANDU's "really horrible" defects.[8][9] This shift extended beyond isolated improvements, influencing the development of combined generators like the Mersenne Twister in 1998, which addressed persistent LCG limitations in period length and equidistribution through a matrix-based linear recurrence.[9]
RANDU's shortcomings also shaped standardization efforts, notably informing the parameters of the ANSI C rand() function, which uses a modulus of $2^{31} but selects a multiplier (1103515245) designed to evade RANDU-like planar concentrations.[9] Contemporary libraries, such as those in Numerical Recipes and Java's java.util.Random, further reflect this legacy by prioritizing prime or specialized moduli over pure powers of 2, which inherently restrict lower-bit periods and amplify correlations.[9]
Today, RANDU endures as a pedagogical exemplar in computer science curricula and texts, illustrating the perils of untested pseudorandom generators and underscoring the value of empirical validation through tests like the spectral analysis.[9]