Xorshift

Xorshift random number generators (RNGs) are a class of pseudorandom number generators that produce sequences of apparently random numbers through simple bitwise operations, specifically a series of left and right shifts combined with exclusive or (XOR) operations applied to the bits of one or more words in a shift register. Invented by George Marsaglia and published in 2003, these generators are designed for high speed and long periods, such as $2^{32} - 1, $2^{64} - 1, up to $2^{192} - 1, depending on the state size and chosen parameters, making them suitable for non-cryptographic applications requiring efficient random number generation. The core mechanism of an xorshift RNG involves initializing a state vector of k bits and iteratively updating it using a composition of shift and XOR operations, typically three such steps for basic variants, to ensure the sequence cycles through all non-zero states for the full period. Marsaglia provided specific parameter sets—such as shift amounts of 13, 17, and 5 for a 32-bit generator—that yield generators passing most of the Diehard statistical test suite, though failing the binary rank test, emphasizing their simplicity with minimal computational overhead compared to other linear feedback shift register (LFSR)-based methods. These RNGs operate over finite fields analogous to GF(2), where the operations correspond to linear transformations that can be analyzed for period length and spectral properties.^[1] Despite their advantages, subsequent empirical evaluations in 2005 revealed that many of Marsaglia's proposed xorshift configurations, particularly those with three xorshift steps, exhibit poor equidistribution and fail modern statistical tests like those in TestU01, due to detectable linear structures in low dimensions.^[2] To address these shortcomings, improvements such as the xorshift* family, introduced by Sebastiano Vigna in 2014, incorporate a multiplication by a large odd constant after the xorshift operations to scramble the output and enhance spectral test performance while preserving speed.^[3] Further evolutions include the xoshiro and xoroshiro generators (2018–2019), which build on xorshift principles with additional rotations for better statistical quality and jump functions for parallelization, widely adopted in software libraries for their balance of performance and reliability.^[4]

Introduction

Definition and History

Xorshift generators are a class of pseudorandom number generators (PRNGs) based on linear feedback shift registers (LFSRs) that update their internal state using bitwise exclusive-or (XOR) operations combined with left and right bit shifts.^[5] These operations allow for extremely fast generation of pseudorandom numbers while maintaining long periods characteristic of LFSR-based designs.^[5] George Marsaglia introduced xorshift generators in 2003 through his paper "Xorshift RNGs," published in the Journal of Statistical Software.^[5] In this work, Marsaglia detailed implementations for 32-bit and 64-bit states, achieving periods up to $2^{1024} - 1, and provided theoretical background on parameter selection to ensure maximal periods.^[5] The generators were designed to pass rigorous statistical tests like Diehard, emphasizing their suitability for simulations requiring high-speed randomness.^[5] Xorshift builds on earlier shift-register generators, such as traditional LFSRs, as well as other PRNG families like linear congruential generators, to address the demand for simple, efficient algorithms in statistical computing and Monte Carlo simulations.^[2] Marsaglia's innovation simplified the feedback mechanism by relying solely on XOR and shifts, eliminating the need for more complex operations like addition or multiplication found in prior designs.^[5] The evolution of xorshift includes nonlinear extensions proposed by Marsaglia in the same 2003 paper, incorporating multiplication to enhance output quality without sacrificing speed.^[5] In 2014, Sebastiano Vigna introduced the xorshift* family, incorporating multiplication to improve statistical properties, followed by xorshift+ variants using addition.^[3] Building on these, in 2018, David Blackman and Sebastiano Vigna developed the xoshiro family as an advancement, scrambling the linear structure of xorshift to improve statistical properties and reduce detectable patterns in test suites like TestU01.^[4]

Key Advantages and Limitations

Xorshift generators are prized for their exceptional speed, achieved through reliance on simple bitwise operations—shifts and exclusive-or (XOR)—without the need for multiplication, division, or other arithmetic that slows down computation. This design enables them to produce random numbers at rates far exceeding more complex alternatives; for instance, benchmarks on modern x86 processors show xorshift128+ generating output at approximately 0.91 cycles per byte, making them suitable for high-throughput applications in embedded systems and large-scale simulations.^[6] Their compact state size, typically 128 to 256 bits, further enhances efficiency by minimizing memory footprint while supporting extremely long periods, such as $2^{256} - 1 for well-chosen parameters, ensuring ample sequence length for most non-cryptographic needs.^[2] In comparisons, xorshift variants often outperform established generators like the Mersenne Twister in speed, with reports indicating up to 10 times faster generation in software implementations due to fewer operations per output. They also edge out PCG in raw throughput under certain conditions, such as 0.91 cycles per byte for xorshift128+ versus 1.27 for pcg64, though PCG may achieve parity or better in optimized scenarios.^[6] These advantages position xorshift as ideal for use cases demanding rapid generation without stringent security requirements, including Monte Carlo simulations, procedural content in games, and general-purpose randomization in non-sensitive algorithms.^[7] Despite these strengths, xorshift generators suffer from significant limitations in statistical quality. Their purely linear structure manifests as a detectable lattice structure in the output space, causing failures in spectral tests and poor equidistribution in low dimensions—for example, many variants exhibit Δ₁ values as low as 1 to 153 for 32-bit words, far worse than competitors like the Twisted GFSR (Δ₁=261).^[2] This leads to spectacular failures in empirical test batteries, such as SmallCrush and Crush from TestU01, with p-values dropping below 10^{-300} for common parameter choices.^[2] Moreover, their predictability from linear operations renders them unsuitable for cryptographic purposes, where even partial state exposure could compromise security. While nonlinear variants mitigate some issues, basic xorshift remains inferior in equidistribution compared to modern alternatives like PCG, which better handle multi-dimensional uniformity.^[7]^[2]

Core Mechanism

Fundamental Operations

Xorshift generators are defined by a set of fundamental bitwise operations that update the internal state to produce pseudorandom numbers. The core operations consist solely of the bitwise exclusive-or (XOR, denoted ⊕) combined with left shifts (<<) and right shifts (>>), eschewing any multiplication, addition, or other arithmetic beyond these bit manipulations. This simplicity enables extremely fast computation, as each update requires only a few shift and XOR instructions on modern processors.^[2] The general state update rule applies a sequence of these xorshift operations, typically three steps, to the words in the state. For a single-word state, this is done sequentially on the same word, e.g., s \leftarrow s \oplus (s \ll a); s \leftarrow s \oplus (s \gg b); s \leftarrow s \oplus (s \ll c), where a, b, c are chosen shift amounts. For multi-word states, the update follows a linear recurrence v_i = \sum_{j=1}^r A_j v_{i-j} \mod 2, where each A_j is a matrix composed of products of shift and identity matrices corresponding to xorshift operations, propagating changes across the state vector.^[2] Output extraction from the updated state typically uses one of the state words—often the most significant one—or a simple combination thereof, interpreted directly as the pseudorandom integer. These operations are performed on fixed-width words, commonly 32-bit or 64-bit unsigned integers, with multi-word states (e.g., 2 to 6 words) employed to achieve longer periods without increasing per-step computational cost.^[2] Mathematically, xorshift generators implement linear transformations over the finite field GF(2, where each bit is treated as an element of {0, 1} under modulo-2 arithmetic. The state can be represented as a vector in [\text{GF}(2)]^{nw}, and the update corresponds to multiplication by a companion matrix whose characteristic polynomial determines the sequence period; period analysis thus reduces to checking the matrix's order via exponentiation in the matrix algebra over GF(2.^[2]

State and Period Characteristics

The state of an xorshift generator is represented as an array of k words, where each word is typically 32-bit or 64-bit, resulting in a total state size of $32k or $64k bits; for example, a 128-bit state uses four 32-bit words.^[2] The initial state must consist of nonzero values to avoid degenerate cycles.^[8] The period of an xorshift generator, which is the length of the cycle before the state repeats, can reach a maximum of $2^{32k} - 1 for a state of $32k bits when the underlying characteristic polynomial is primitive over the finite field GF(2. However, not all parameter choices yield the full period; careful selection is required to ensure the characteristic polynomial is primitive.^[2] George Marsaglia selected parameters corresponding to such primitive polynomials to ensure this maximal period in his original designs. In contrast, multi-word configurations enable much larger periods; for instance, a 128-bit state (k=4, 32-bit words) can attain $2^{128} - 1.^[8] The cycle properties of xorshift generators are maximal when the characteristic polynomial is primitive, ensuring that the state sequence visits every nonzero state exactly once before repeating.^[2] The output bits produced by the generator form de Bruijn sequences of order 1 when the polynomial is primitive, providing optimal coverage of bit patterns over the full period.^[9] To support parallelization, xorshift implementations include jump functions that advance the state by $2^m steps efficiently, often by computing powers of the transition matrix modulo the characteristic polynomial; for example, in larger states like 1024 bits, jumps of $2^{512} steps allow independent streams for multiple threads.^[9]

Linear Variants

Standard Xorshift Algorithm

The standard linear xorshift algorithm for larger periods, such as the "xor128" variant introduced by George Marsaglia, uses a 128-bit state composed of four 32-bit words, denoted as

s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}, s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}}, s{{grok:render&&&type=render_inline_citation&&&citation_id=2&&&citation_type=wikipedia}}, s{{grok:render&&&type=render_inline_citation&&&citation_id=3&&&citation_type=wikipedia}}

. This pseudorandom number generator updates the state by rotating the words circularly and applying XOR operations with left- and right-shifted versions to a temporary copy of one word, using shift parameters 11 (left), 8 (right), and 19 (right). These parameters were selected to achieve good statistical properties and a full period of $2^{128} - 1.^[5] The pseudocode for the state update and output generation is as follows:

[function](/page/Function) xorshift_next(s[0..3]):
    t = s[3]
    s[3] = s[2]
    s[2] = s[1]
    s[1] = s[0]
    t ^= (t << 11)
    t ^= (t >> 8)
    s[0] = t ^ s[0] ^ (t >> 19)
    return s[0]  // Output the updated low word (s[0])
[function](/page/Function) xorshift_next(s[0..3]):
    t = s[3]
    s[3] = s[2]
    s[2] = s[1]
    s[1] = s[0]
    t ^= (t << 11)
    t ^= (t >> 8)
    s[0] = t ^ s[0] ^ (t >> 19)
    return s[0]  // Output the updated low word (s[0])

In this form, the output function returns the updated low word of the state (

s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}

) as a 32-bit integer. Marsaglia noted that this yields fast generation with minimal overhead. Optional tempering can be applied post-update to improve equidistribution, but the core relies on the linear operations for simplicity.^[5]

Parameter Selection and Periods

The selection of shift parameters in linear xorshift generators requires the associated characteristic polynomial over GF(2) to be primitive, ensuring a maximal period of $2^n - 1, where n is the total state size in bits. For single-word configurations of size w, the polynomial is of the form x^w + x^{w-a} + x^{w-b} + x^{w-c} + 1 (equivalent to x^w + x^a + x^b + x^c + 1 up to reversal), and parameters a, b, c (with $0 < c < b < a < w) must yield a primitive trinomial. For multi-word states, the overall linear transformation (including the feedback and rotation) must have a primitive characteristic polynomial of degree n. These are identified through exhaustive searches using finite field tools, as primitive polynomials are sparse for high degrees.^[2] George Marsaglia provided tables of parameters for various implementations. For single 32-bit words, shifts 13, 17, 5 (left, right, left) yield period $2^{32} - 1. For the 128-bit state with four 32-bit words (xor128), the implementation uses shifts 11, 8, 19 to achieve $2^{128} - 1. Similarly, for single 64-bit words, shifts 12, 25, 27 achieve $2^{64} - 1; an analogous multi-word setup for 256 bits (four 64-bit words) can use parameters from Marsaglia's tables to reach $2^{256} - 1. These were chosen for full-period properties and performance in randomness tests like Diehard.^[5] To verify primitivity and period, methods like the Berlekamp-Massey algorithm can find the minimal polynomial of the linear recurrence, or the order of the companion matrix can be computed in GF(2). Such checks confirm no short cycles.^[2] While suitable shift values improve hardware efficiency, poor choices lead to non-primitive polynomials and shorter periods, reducing uniformity.^[2]

Nonlinear Variants

Xorwow Generator

The xorwow generator is a nonlinear extension of the xorshift family, introduced by George Marsaglia to enhance the statistical quality of the generated numbers through an additional scrambling mechanism. It uses a state composed of five 32-bit words (denoted as s to s^[10]) and a separate 32-bit counter, providing a total state size of 192 bits. The name "xorwow" derives from the combination of xorshift operations with a "Weyl sequence," a linear congruential component that scrambles the output to improve equidistribution and reduce visible patterns in lower bits. This design addresses limitations in pure linear xorshift generators by introducing nonlinearity via the counter update and output mixing.^[11] The state update follows a sequence of bitwise shifts and XORs to advance the generator. A temporary value is computed as t = s XOR (s right-shifted by 2 bits). The words are then cycled: s ← s^[12], s^[12] ← s^[13], s^[13] ← s^[14], s^[14] ← s^[10], and s^[10] ← s^[10] XOR (s^[10] left-shifted by 4 bits) XOR t XOR (t left-shifted by 1 bit). Subsequently, the counter is updated using counter ← (counter + 362437) bitwise-OR 1, ensuring the counter remains odd and cycles through all possible 32-bit values except zero. This update rule maintains the linear feedback shift register (LFSR) structure of the core xorshift while incorporating the counter for nonlinearity.^[11] The output is generated by scrambling the updated state to produce a 32-bit pseudorandom integer. In the base design, this involves adding the counter to s^[10] modulo 2^{32}. To further enhance randomness and decorrelate bits, particularly improving the quality of the least significant bits, an optional multiplication-based scrambling is applied: compute temp = s × 18782 (using 64-bit arithmetic), then return the low 32 bits of temp XOR the high 32 bits of temp. This step introduces additional mixing, making the output less predictable and better suited for applications requiring high-quality randomness across all bits. The xorwow generator is used in the NVIDIA cuRAND library for GPU-accelerated random number generation, where its efficiency and performance on parallel hardware are leveraged.^[11]^[15] The period of the xorwow generator is 2^{192} - 2^{32}, achieved by the full traversal of the 160-bit xorshift state (with period 2^{160} - 1 under primitive polynomial conditions) multiplied by the counter's effective cycle length, adjusted for the increment ensuring coprimality with 2^{32}. Compared to linear xorshift variants, xorwow exhibits superior low-bit randomness due to the scrambling, avoiding lattice structures that can cause failures in statistical tests focused on bit independence. It successfully passes the Diehard test suite, a comprehensive battery of randomness assessments, demonstrating its practical reliability for simulations and Monte Carlo methods.^[11]

Xorshift* with Multiplication

The xorshift* generators represent a nonlinear extension of the original linear xorshift family, introduced by Sebastiano Vigna in 2014 to address limitations in the output quality of pure xorshift operations. By applying a multiplication step after the linear shift and XOR transformations, xorshift* introduces nonlinearity that effectively scrambles the bits, mitigating linear artifacts inherent in generators based on operations over \mathbb{Z}/2\mathbb{Z}. This approach preserves the speed of xorshift while enhancing statistical properties, making it suitable for applications requiring high-quality pseudorandom numbers without the overhead of more complex scrambling techniques.^[16] The core algorithm begins with the standard linear xorshift state update on a word array s, typically using a sequence of left and right shifts combined with XORs. For a 64-bit single-word implementation, the state update might follow parameters like (12, 25, 27):

s ^= s << 12;
s ^= s >> 25;
s ^= s << 27;
s ^= s << 12;
s ^= s >> 25;
s ^= s << 27;

Following this update, the output is generated by multiplying the updated s (specifically

s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}

in multi-word cases) by a carefully chosen odd 64-bit constant, such as 2685821657736338717, and taking the lower 64 bits of the result. This constant is selected for its ability to mix high and low bits effectively, as determined by figures of merit in the spectral test. The multiplication ensures that linear dependencies in the xorshift output are diffused across the word.^[16] The period of an xorshift* generator matches that of its underlying linear xorshift counterpart, as the multiplication does not alter the state transition cycle; for a 64-bit word, this yields a full period of $2^{64} - 1. Multi-word variants, such as xorshift128* or xorshift1024*, achieve correspondingly larger periods like $2^{128} - 1 or $2^{1024} - 1. These periods are attained provided the shift parameters form a primitive polynomial over \mathbb{Z}/2\mathbb{Z}, ensuring the generator's maximal cycle length.^[16] A key improvement of xorshift* over plain linear xorshift is its superior performance in spectral tests, which evaluate the distribution of points generated by the RNG in higher dimensions. Linear xorshift often fails these tests due to visible lattice structures from unscrambled output, but the multiplication step hides such artifacts, enabling xorshift* variants to pass the stringent BigCrush battery in TestU01 with minimal failures. Additionally, xorshift* is computationally efficient, requiring only one 64-bit multiplication per output—far less costly than full tempering sequences in generators like Mersenne Twister—while delivering comparable or better randomness quality in practice. For instance, a 64-bit xorshift* implementation generates output at approximately 1.58 ns per 64 bits on modern hardware, underscoring its balance of speed and reliability.^[16]

Xorshift+ Additions

The xorshift+ generators introduce nonlinearity to the base xorshift algorithm through addition, enhancing statistical properties while maintaining computational efficiency. In this variant, the state consists of two 64-bit words, denoted as

s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}

and

s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}}

, forming a 128-bit internal state. The update proceeds by first computing the output as the sum

s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} + s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}}

in \mathbb{Z}/2^{64}\mathbb{Z}, followed by a linear xorshift transformation on the state using shifts of 23 bits left, 18 bits right, and 5 bits right. Specifically, the state advancement can be expressed as:

\begin{align*} s_1 &= s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}}, \\ s_0 &= s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}}, \\ \text{output} &= s_0 + s_1, \\ s{{grok:render&&&type=render_inline_citation&&&citation_id=0&&&citation_type=wikipedia}} &= s_0, \\ s{{grok:render&&&type=render_inline_citation&&&citation_id=1&&&citation_type=wikipedia}} &= (s_1 \oplus (s_1 \ll 23)) \oplus s_0 \oplus ((s_1 \oplus (s_1 \ll 23)) \gg 18) \oplus (s_0 \gg 5). \end{align*}

This produces a 64-bit output per invocation. The shift parameters (23, 18, 5) are selected from established xorshift configurations inspired by Marsaglia's original designs to ensure a maximal period of $2^{128} - 1.^[3] The addition step breaks the linearity inherent in pure xorshift operations, which can lead to detectable artifacts in low-dimensional projections such as 2D scatter plots. By summing state components before outputting, xorshift+ disrupts linear dependencies, resulting in improved equidistribution in projections up to dimension 2, as verified by passing the BigCrush battery of the TestU01 suite.^[3] However, while the forward generator performs well, its reverse (running backward) exhibits weaknesses in tests like LinearComp and MatrixRank due to insufficient mixing in lower bits.^[3] xorshift+ has seen adoption in performance-critical software, such as the V8 JavaScript engine in Chrome, where its speed (approximately 1.06 ns per 64 bits) and full-period properties suit parallel or embedded use cases. Further analysis confirms its suitability for scenarios demanding fast, non-cryptographic randomness.^[3]

Xoshiro Family

Overview and Design Principles

The xoshiro family of pseudorandom number generators (PRNGs) was developed in 2018 by David Blackman and Sebastiano Vigna as an advancement in linear feedback shift register-based designs, building briefly on earlier nonlinear variants of the xorshift family.^[17] The name "xoshiro" derives from the core operations it employs: XOR, shift, and rotate, distinguishing it from the related xoroshiro subfamily which emphasizes rotate, shift, rotate sequences.^[18] This family prioritizes high performance and statistical quality, targeting applications in simulations, cryptography, and parallel computing where fast, high-quality randomness is essential.^[17] At its core, xoshiro uses a 256-bit internal state composed of four 64-bit words, denoted typically as s_0, s_1, s_2, s_3. The design combines traditional xorshift updates—linear operations involving XORs and shifts—with 64-bit left rotations (denoted as rol) to introduce controlled nonlinearity. This approach avoids the computational overhead of multiplication operations found in some prior designs, preserving speed while enhancing output independence and reducing detectable linear artifacts. The base state update involves the following operations: compute t = s_1 \ll 17; s_2 \oplus= s_0; s_3 \oplus= s_1; s_1 \oplus= s_2; s_0 \oplus= s_3; s_2 \oplus= t; s_3 = \operatorname{rotl}(s_3, 45).^[17] These rotations, applied post-XOR, scramble the linear structure effectively, aiming for a full period of $2^{256} - 1 in key variants.^[18] The primary design principles emphasize achieving excellent statistical properties without sacrificing throughput, with all prominent xoshiro generators passing the rigorous BigCrush test suite from the TestU01 library, which detects subtle biases in PRNG outputs.^[17] Performance benchmarks on modern hardware, such as an Intel i7-8700B, show xoshiro variants generating 64-bit outputs in under 0.9 nanoseconds, outperforming the PCG family in certain metrics like single-threaded speed while maintaining comparable or superior quality.^[17] Additionally, the generators are designed to be jumpable, incorporating efficient functions to advance the state by large increments (e.g., the square root of the period), enabling the creation of non-overlapping streams for parallel processing without sequence correlation.^[18] This makes xoshiro particularly suitable for multi-threaded environments and large-scale simulations.^[17]

Xoshiro256 Variants

The xoshiro256 family consists of three primary variants—xoshiro256+, xoshiro256++, and xoshiro256**—each sharing a common 256-bit state composed of four 64-bit words, denoted as s_0, s_1, s_2, s_3. The state update mechanism is identical across these variants and involves a linear transformation using XOR operations, a left shift, and a rotation to ensure a full period of $2^{256} - 1, provided the initial state is not all zeros. Specifically, the update proceeds as follows: compute t = s_1 \ll 17; then s_2 \oplus= s_0, s_3 \oplus= s_1, s_1 \oplus= s_2, s_0 \oplus= s_3; followed by s_2 \oplus= t and s_3 = \operatorname{rotl}(s_3, 45), where \ll denotes a left shift, \oplus is bitwise XOR, and \operatorname{rotl}(x, k) rotates the 64-bit word x left by k bits.^[19]^[20]^[21]^[17] The xoshiro256+ variant produces output via simple addition of two state words: \operatorname{next}() = s_0 + s_3, where addition is modulo $2^{64}. This approach provides a straightforward, fast extraction suitable for applications requiring uniform 64-bit integers, though it may exhibit some linear dependencies in lower bits.^[19]^[17] In contrast, xoshiro256++ enhances mixing through a folded sum that simulates a 128-bit result compressed to 64 bits: \operatorname{next}() = \operatorname{rotl}(s_0 + s_3, 23) + s_0. The initial addition s_0 + s_3 combines distant state elements, followed by a 23-bit left rotation and a second addition with s_0 to further scramble bits and reduce visible structure in the output. This double-addition technique improves statistical properties over the plain sum, making it effective for general-purpose use while maintaining high speed.^[20]^[17] The xoshiro256** variant introduces nonlinear scrambling via multiplications to break linear artifacts: \operatorname{next}() = \operatorname{rotl}(s_1 \times 5, 7) \times 9, where \times denotes 64-bit unsigned multiplication (using the high 64 bits of the 128-bit product). The multipliers 5 and 9, combined with rotations by 7 and an implicit 13-bit effective mix from the second multiplication, ensure excellent diffusion and equidistribution. Despite the added computational cost of multiplications, this variant is recommended for its superior quality, passing rigorous statistical tests and exhibiting minimal Hamming-weight dependencies.^[21]^[17]

Xoroshiro Subfamily

The xoroshiro subfamily consists of pseudorandom number generators that employ a linear feedback shift register mechanism similar to the xoshiro family, but generate output from a reduced subset of the internal state rather than the full state, enabling lighter implementations with smaller memory footprints.^[22] These generators update the state by cyclically processing pairs of words using XOR, rotation, and shift operations, followed by scrambling the output for improved statistical properties. Developed by David Blackman and Sebastiano Vigna, the design prioritizes speed on modern hardware while maintaining long periods and passing rigorous statistical tests like TestU01's BigCrush suite.^[17] A key variant is xoroshiro128+, which uses a 128-bit state divided into two 64-bit words, s_0 and s_1. The output is produced as the sum s_0 + s_1 (modulo $2^{64}) after each state update, achieving a full period of $2^{128} - 1.^[22] These "+" variants are optimized for generating uniform 64-bit integers or floating-point numbers, with the addition serving as a simple scrambler to decorrelate the output from the linear state transitions.^[17] For enhanced scrambling, the "" variants introduce multiplicative operations. Xoroshiro128, for instance, produces output as \operatorname{rotl}(s_0 \times 5, 7) \times 9, where \times denotes 64-bit unsigned multiplication using the high 64 bits of the 128-bit product. This mirrors the scrambling in xoshiro256** but on a halved state size, preserving the period $2^{128} - 1 while improving low-bit independence.^[22] Such scrambled designs are particularly effective for applications requiring higher-quality randomness without increasing state overhead. The xoroshiro generators are well-suited for resource-constrained environments, such as embedded systems or lightweight simulations, due to their minimal state requirements and execution speeds around 0.7–1.0 nanoseconds per 64-bit output on typical CPUs.^[17] They have been adopted in production software, including Java's random number facilities and the Lua interpreter, demonstrating their balance of efficiency and reliability.^[22]

Statistical Evaluation

The xoshiro family of pseudorandom number generators has undergone extensive statistical evaluation using the TestU01 suite, particularly its BigCrush battery comprising approximately 1.5 × 10^9 stringent tests designed to detect deviations from uniformity and independence. The scrambled variants, such as xoshiro256** and xoshiro256++, pass BigCrush without systematic failures when generating 64-bit outputs, confirming their suitability for demanding applications. In contrast, unscrambled or simply additive variants like xoshiro128+ may fail linearity-related tests such as MatrixRank and LinearComp due to detectable F₂-linear structures.^[22] Compared to the widely used Mersenne Twister (MT19937), xoshiro generators exhibit superior performance in low- to mid-dimensional equidistribution tests (dimensions 2 through 16), with empirical analyses showing reduced biases in binary matrix rank distributions and linear complexity metrics. MT19937, while passing many tests, fails LinearComp and displays lattice artifacts in higher dimensions, whereas recommended xoshiro configurations avoid such issues in practical upper-bit extractions.^[22] A notable weakness in the additive (+) variants of xoshiro and xoroshiro is the presence of lattice structure problems in high dimensions, manifesting as low linear complexity in the lowest bits (e.g., below 100 bits for some configurations), which can lead to failures in spectral or matrix-based tests. These issues arise from the inherent linearity of shift-register operations but are effectively mitigated in the multiplicative () variants through odd constant multiplication, elevating the linear complexity of output bits to extraordinarily high values, such as approximately 3 × 10^{37} for the least significant bit of xoshiro256. No linear complexities below the 256-bit state size are observed in these optimized setups.^[22] In dependency and collision-related assessments, xoshiro variants outperform their predecessors like xorshift+; for instance, the Hamming-weight dependency test detects biases in xorshift128+ after roughly 6 × 10^9 bytes, whereas xoroshiro128+ withstands up to 5 × 10^{12} bytes before transitional failures. This enhanced resilience, combined with full BigCrush compliance, positions xoshiro as preferable to older xorshift designs for general-purpose use.^[23]^[22] As of 2023, minor refinements to parameters in extensions like the LXM family—building on xoshiro principles—have further improved equidistribution in splittable, multithreaded scenarios, with ongoing validations confirming no regressions in statistical quality.^[24]

Practical Implementation

Code Examples

Xorshift generators are typically implemented using simple bit operations for efficiency across programming languages. The following examples illustrate key variants, drawing from original implementations to ensure correctness and performance. These codes assume standard integer types and focus on generating unsigned 32-bit or 64-bit outputs. A standard 128-bit xorshift generator uses a state array of four 32-bit words and applies shifts of 11, 19, and 8 positions, achieving a period of $2^{128} - 1. The C implementation below updates the state in place and returns the least significant word as the output.^[11]

c
#include <stdint.h>

uint32_t xorshift128(uint32_t state[4]) {
    uint32_t t = state[0] ^ (state[0] << 11);
    state[0] = state[1];
    state[1] = state[2];
    state[2] = state[3];
    state[3] = t ^ (state[3] >> 19) ^ (t << 8);  // Note: Original paper uses unsigned long (32-bit on typical systems)
    return state[3];
}
#include <stdint.h>

uint32_t xorshift128(uint32_t state[4]) {
    uint32_t t = state[0] ^ (state[0] << 11);
    state[0] = state[1];
    state[1] = state[2];
    state[2] = state[3];
    state[3] = t ^ (state[3] >> 19) ^ (t << 8);  // Note: Original paper uses unsigned long (32-bit on typical systems)
    return state[3];
}

For the xoshiro256** variant, a 256-bit state is maintained as four 64-bit integers, with output produced via multiplication and rotation for improved statistical properties. The Python class below mirrors the official C reference, using 64-bit integers and bitwise operations for portability. Rotations are implemented explicitly to avoid language-specific intrinsics.^[22]

python
class Xoshiro256StarStar:
    def __init__(self, seed):
        self.s = [0] * 4
        # [Seeding](/page/Seeding) omitted here; use a non-zero splitmix64 or similar
        self.s = [seed & 0xFFFFFFFFFFFFFFFF, (seed >> 64) & 0xFFFFFFFFFFFFFFFF, 0, 0]  # Placeholder; proper [seeding](/page/Seeding) required

    def next(self):
        result = ((self.s[1] * 5) & 0xFFFFFFFFFFFFFFFF) << 7 | ((self.s[1] * 5) & 0xFFFFFFFFFFFFFFFF) >> 57
        result = (result * 9) & 0xFFFFFFFFFFFFFFFF

        t = self.s[1] << 17
        self.s[2] ^= self.s[0]
        self.s[3] ^= self.s[1]
        self.s[1] ^= self.s[2]
        self.s[0] ^= self.s[3]

        self.s[2] ^= t
        self.s[3] = (self.s[3] << 45) | (self.s[3] >> 19)

        return result
class Xoshiro256StarStar:
    def __init__(self, seed):
        self.s = [0] * 4
        # [Seeding](/page/Seeding) omitted here; use a non-zero splitmix64 or similar
        self.s = [seed & 0xFFFFFFFFFFFFFFFF, (seed >> 64) & 0xFFFFFFFFFFFFFFFF, 0, 0]  # Placeholder; proper [seeding](/page/Seeding) required

    def next(self):
        result = ((self.s[1] * 5) & 0xFFFFFFFFFFFFFFFF) << 7 | ((self.s[1] * 5) & 0xFFFFFFFFFFFFFFFF) >> 57
        result = (result * 9) & 0xFFFFFFFFFFFFFFFF

        t = self.s[1] << 17
        self.s[2] ^= self.s[0]
        self.s[3] ^= self.s[1]
        self.s[1] ^= self.s[2]
        self.s[0] ^= self.s[3]

        self.s[2] ^= t
        self.s[3] = (self.s[3] << 45) | (self.s[3] >> 19)

        return result

The 64-bit xorshift* variant applies three shifts (12 right, 25 left, 27 right) followed by multiplication with a fixed 64-bit constant to scramble the output, yielding a period of $2^{64} - 1. This portable C snippet uses standard shifts without inline assembly.

c
#include <stdint.h>

uint64_t xorshift64star(uint64_t *state) {
    uint64_t x = *state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    *state = x;
    return x * UINT64_C(2685821657736338717);
}
#include <stdint.h>

uint64_t xorshift64star(uint64_t *state) {
    uint64_t x = *state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    *state = x;
    return x * UINT64_C(2685821657736338717);
}

Implementations must avoid initializing the state to all zeros, as this leads to a fixed zero output sequence, violating the full period. For portability, use fixed-width types like uint32_t or uint64_t and implement rotations as (x << k) | (x >> (64 - k)) to handle varying word sizes. Bit operations should employ unsigned types to prevent sign extension issues on two's-complement systems. To test these generators, a simple C main function can initialize a state (e.g., with nonzero values like 123456789, 362436069, 521288629, 88675123 for 128-bit) and print the first 10 outputs:

c
#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t state[4] = {123456789, 362436069, 521288629, 88675123};
    for (int i = 0; i < 10; ++i) {
        [printf](/page/Printf)("%u\n", xorshift128(state));
    }
    [return 0](/page/Return_0);
}
#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t state[4] = {123456789, 362436069, 521288629, 88675123};
    for (int i = 0; i < 10; ++i) {
        [printf](/page/Printf)("%u\n", xorshift128(state));
    }
    [return 0](/page/Return_0);
}

This produces a sequence verifiable against reference implementations, confirming correct state advancement.^[11]

Initialization Methods

Initialization of xorshift generators involves setting the initial state of the shift register to ensure the sequence begins with good statistical properties and avoids degenerate cases. The state, typically consisting of one or more words (e.g., 32-bit or 64-bit integers), must be initialized such that it is not all zeros, as this would cause the generator to output zeros indefinitely, violating the desired period and randomness properties. Basic seeding techniques fill the state using a provided seed value, often expanded via a simple mixing function or external entropy source. For a single-word state like xorshift32 or xorshift64, the seed can be directly assigned, provided it is non-zero. For multi-word states, such as in xorshift128 or larger variants, a common approach is to use a linear congruential mixer to generate distinct values for each word from a smaller seed, ensuring uniformity across the state space. One widely recommended mixer is SplitMix64, which applies a series of shifts, multiplications, and XORs to scramble the seed and produce successive 64-bit outputs for filling the state; this method allows initialization from a single 64-bit seed while covering a broad range of possible states. Alternatively, system entropy from sources like /dev/urandom on Unix-like systems can be read to directly populate the state, providing high-quality randomness for applications requiring unpredictability. Time-based seeds, such as the current timestamp, offer a practical but less secure option for non-cryptographic uses, though they should be combined with additional mixing to avoid predictability.^[22] Marsaglia's original xorshift designs for multi-word states employed the multiply-with-carry (MWC) method to generate the initial words, leveraging its ability to produce long-period sequences from a small seed set. In this approach, an MWC generator—itself a fast PRNG using multiplication and carry propagation—is run to fill each word of the xorshift state sequentially, ensuring the initial configuration avoids low-entropy patterns and achieves the full period when the primitive polynomial conditions are met. This technique was particularly useful in Marsaglia's combined generators like KISS, where MWC complements the xorshift component for robust initialization. To support parallel computations or multiple independent streams from a single seed, jump seeding applies a jump function that advances the state by a large number of steps, effectively deriving disjoint subsequences. For xorshift variants with large states, such as xorshift1024*, the jump function is implemented using polynomial arithmetic over the generator's linear feedback shift register representation, enabling advances of up to 2^512 steps in constant time via precomputed coefficients. This allows, for example, seeding parallel instances by applying successive jumps of 2^64 steps from a base state, ensuring non-overlapping periods without recomputing the entire sequence.^[16] Best practices emphasize using at least 64-bit seeds for states of 256 bits or larger to adequately explore the state space, avoiding trivial initializations like all-ones or sequential values that could lead to correlated outputs in early sequence portions. In long-running simulations, periodic reseeding with fresh entropy is advised to mitigate potential correlations from extended use, though for pure pseudorandom applications, a single well-chosen seed suffices for the generator's full period. These methods ensure xorshift generators deliver reliable performance across diverse applications, from simulations to procedural generation.^[25]