Fact-checked by Grok 2 weeks ago

Bernoulli process

In probability theory, a Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent Bernoulli trials, where each trial has exactly two possible outcomes—typically termed "success" with probability p (where $0 \leq p \leq 1) and "failure" with probability $1 - p—and the probability remains constant across all trials.^[1]^[2] The outcomes can be represented by indicator random variables X_i taking value 1 for success and 0 for failure, with \mathbb{E}[X_i] = p and \mathrm{Var}(X_i) = p(1-p).^[3] This process models fundamental random experiments such as repeated coin flips or binary events in statistics and serves as the foundation for more complex distributions.^[1] Named after the Swiss mathematician Jacob Bernoulli (1654–1705), the concept emerged from his pioneering work on probability, particularly in his posthumously published book Ars Conjectandi (1713), where he formalized the binomial expansion for general success probabilities and proved an early version of the law of large numbers applicable to sequences of such trials.^[4] Prior contributions, such as those by Blaise Pascal and Pierre de Fermat in the 1650s, had explored fair coin scenarios ( p = 1/2 ) leading to the binomial coefficients via Pascal's triangle, but Bernoulli extended this to arbitrary p, establishing the process's generality.^[1] His law of large numbers asserts that, as the number of trials increases, the sample proportion of successes converges to p with high probability, underscoring the process's role in empirical estimation and statistical inference.^[4] The Bernoulli process underpins several key probability distributions derived from partial sums or waiting times within the sequence. For instance, the total number of successes in n fixed trials follows a binomial distribution with parameters n and p, characterized by probability mass function P(K = k) = \binom{n}{k} p^k (1-p)^{n-k}.^[3] The number of trials until the first success adheres to a geometric distribution, with P(X = k) = (1-p)^{k-1} p for k = 1, 2, \dots, while the trials until the r-th success yield a negative binomial distribution, P(Y = y) = \binom{y-1}{r-1} p^r (1-p)^{y-r} for y = r, r+1, \dots.^[1] These distributions are essential in fields like reliability engineering, quality control, and queueing theory, where binary events recur independently.^[2] Beyond basic applications, the Bernoulli process connects to broader stochastic modeling, such as the simple random walk (where position is the cumulative sum of outcomes) and extensions like the Beta-Bernoulli process for Bayesian inference with unknown p.^[3] Its independence assumption distinguishes it from dependent binary sequences, like Markov chains with memory, ensuring tractable computations via the product of individual probabilities.^[2] Violations of independence or constant probability lead to alternative models, but the Bernoulli framework remains a cornerstone for approximating real-world phenomena under ideal conditions.^[1]

Definition and Interpretation

Basic Definition

A Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent and identically distributed random variables \{X_n : n = 1, 2, \dots \}, where each X_n follows a Bernoulli distribution with success probability p (where $0 \leq p \leq 1), taking the value 1 with probability p and 0 with probability $1-p.^[5] The parameter p represents the fixed probability of success in each trial, remaining constant across the sequence.^[5] In a more general formulation, the Bernoulli process can be viewed as a measurable map from a probability space to the infinite product space \{0,1\}^\mathbb{N}, where each coordinate corresponds to the outcome of a single trial.^[5] The concept is named after Jacob Bernoulli, who introduced the foundational ideas of repeated independent trials in the context of games of chance in his posthumously published work Ars Conjectandi (1713).^[2] The sum of the first n variables in the process follows a binomial distribution with parameters n and p.^[5]

Probabilistic Interpretation

The Bernoulli process can be intuitively understood as a sequence of repeated independent trials, each resulting in one of two possible outcomes: success or failure. A classic analogy is the repeated flipping of a coin, where "success" might correspond to heads and "failure" to tails. For a fair coin, the probability of success p = 0.5, making each flip equally likely to yield either outcome; in contrast, a biased coin has p \neq 0.5, such as p = 0.7 for heads, reflecting consistent but uneven odds across all flips.^[2] In real-world applications, the Bernoulli process models sequences of binary yes/no events, such as detecting defects in manufactured items during quality control inspections, where each item is independently defective with fixed probability p or non-defective with probability $1 - p. Another example arises in discrete-time customer arrival modeling, where each time slot (e.g., a second or minute) has a probability p of an arrival occurring, independently of previous slots, to approximate sparse arrival patterns in systems like queueing at a service counter.^[6]^[7] A key feature of the Bernoulli process is the independence of trials, meaning the outcome of any single trial does not influence or depend on the results of others, ensuring that the probability p remains constant regardless of prior events. This independence underpins the process's reliability in modeling scenarios where past occurrences provide no predictive information about future ones. The process exhibits stationarity, as the marginal distribution of each trial is identical—always Bernoulli with parameter p—implying that the probabilistic structure does not shift over time or across trials. This stationary property makes the Bernoulli process suitable for long sequences where consistent behavior is expected.^[8]

Formal Definition

Probability Space

The probability space underlying the Bernoulli process in a measure-theoretic framework consists of a sample space \Omega = \{0,1\}^{\mathbb{N}}, which is the set of all infinite sequences of binary outcomes, where each coordinate represents the result of a single trial (0 or 1).^[9]^[10] This space models the infinite sequence of independent trials characteristic of the process.^[11] The measurable structure is provided by the product \sigma-algebra \mathcal{F} on \Omega, generated by the cylinder sets, which are events depending on only finitely many coordinates.^[10] Specifically, for any finite n \in \mathbb{N} and subset A \subseteq \{0,1\}^n, the cylinder set C_{n,A} = \{\omega \in \Omega : (\omega_1, \dots, \omega_n) \in A\} belongs to the generating algebra, and \mathcal{F} = \sigma(\{C_{n,A} : n \in \mathbb{N}, A \subseteq \{0,1\}^n\}).^[9] This \sigma-algebra ensures that events of interest, such as those based on partial observations, are measurable. The product \sigma-algebra coincides with the Borel \sigma-algebra generated by the product topology on \Omega.^[10] The natural filtration \{\mathcal{F}_n\}_{n \in \mathbb{N}} is defined by \mathcal{F}_n = \sigma(X_1, \dots, X_n), where X_k: \Omega \to \{0,1\} is the coordinate projection X_k(\omega) = \omega_k for each k \in \mathbb{N}, representing the outcome of the k-th trial as a random variable.^[9] These \sigma-algebras form an increasing sequence \mathcal{F}_1 \subseteq \mathcal{F}_2 \subseteq \cdots \subseteq \mathcal{F}, capturing the information revealed progressively by the trials.^[12]

Bernoulli Measure

The Bernoulli measure \mu_p, for $0 \leq p \leq 1, is the infinite product probability measure on the product space \{0,1\}^{\mathbb{N}} equipped with the product \sigma-algebra, where each marginal measure on the n-th coordinate is the Bernoulli measure \nu_p satisfying \nu_p(\{1\}) = p and \nu_p(\{0\}) = 1-p.^[13]^[14] This construction ensures that the coordinate projections X_n: \{0,1\}^{\mathbb{N}} \to \{0,1\} form a sequence of independent and identically distributed Bernoulli(p) random variables under \mu_p.^[13] The existence of \mu_p follows from the Kolmogorov extension theorem applied to the consistent family of finite-dimensional distributions given by the product of \nu_p on finite cylinders.^[14] For a cylinder set C defined by fixing the first n coordinates to a specific sequence (x_1, \dots, x_n) \in \{0,1\}^n with exactly k ones (successes), the measure is

\mu_p(C) = p^k (1-p)^{n-k}.

^[13]^[14] More generally, for a cylinder depending on any finite set of coordinates \{i_1, \dots, i_m\} fixed to values (a_1, \dots, a_m) with \sum a_j = k' ones, \mu_p(C) = p^{k'} (1-p)^{m - k'}.^[13] This measure extends uniquely to the full \sigma-algebra by Carathéodory's extension theorem, as the finite-dimensional distributions are consistent under the independent identical distribution assumption.^[14] The Bernoulli measure \mu_p is a probability measure, satisfying \mu_p(\{0,1\}^{\mathbb{N}}) = 1, since each \nu_p is a probability measure and the infinite product preserves this property.^[13] It is atomless when $0 < p < 1, as every singleton \{x\} for x \in \{0,1\}^{\mathbb{N}} has measure zero, being the intersection of nested cylinders whose measures approach zero (product of infinitely many factors each at most \max(p, 1-p) < 1).^[13] Additionally, \mu_p is invariant under the one-sided shift transformation T: \{0,1\}^{\mathbb{N}} \to \{0,1\}^{\mathbb{N}} defined by T(x_1, x_2, \dots) = (x_2, x_3, \dots), meaning \mu_p(T^{-1}(A)) = \mu_p(A) for all measurable A, due to the identical distribution of coordinates.^[13]

Key Properties

Law of Large Numbers

In the context of a Bernoulli process \{X_i\}_{i=1}^\infty consisting of independent Bernoulli random variables with success probability p \in (0,1), the law of large numbers describes the convergence of the sample mean \bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i to the expected value E[X_i] = p. The weak law of large numbers states that for every \epsilon > 0, P(|\bar{X}_n - p| \geq \epsilon) \to 0 as n \to \infty. This result was originally established by Jacob Bernoulli specifically for sequences of Bernoulli trials in his seminal 1713 treatise Ars Conjectandi [^[15]]. A straightforward proof of the weak law relies on Chebyshev's inequality and the independence of the X_i [^[16]]. Since \mathrm{Var}(X_i) = p(1-p) for each i, it follows that \mathrm{Var}(\bar{X}_n) = p(1-p)/n, so

P(|\bar{X}_n - p| \geq \epsilon) \leq \frac{\mathrm{Var}(\bar{X}_n)}{\epsilon^2} = \frac{p(1-p)}{n \epsilon^2} \to 0

as n \to \infty. The strong law of large numbers provides a sharper almost-sure convergence: \bar{X}_n \to p almost surely as n \to \infty. This general form for independent identically distributed random variables with finite mean was proved by Andrey Kolmogorov in 1930 [^[17]]. For the Bernoulli process, the proof exploits the independence, identical distribution, and boundedness ($0 \leq X_i \leq 1) of the variables to apply Kolmogorov's strong law directly; a common sketch uses truncation of the variables to bound moments and invokes the Borel-Cantelli lemma to show that large deviations occur only finitely often with probability 1. This almost-sure convergence implies that the long-run proportion of successes in the process equals p with probability 1, justifying the intuitive notion of the law of large numbers in repeated Bernoulli trials. The diminishing variance \mathrm{Var}(\bar{X}_n) = p(1-p)/n \to 0 as n \to \infty further underscores the increasing concentration of \bar{X}_n around p.

Binomial Distribution

In the context of a Bernoulli process, which consists of a sequence of independent Bernoulli trials each with success probability p, the cumulative number of successes after n trials is a key random variable. Let X_i denote the outcome of the i-th trial, where X_i = 1 for success and X_i = 0 for failure. The sum S_n = \sum_{i=1}^n X_i represents the total number of successes in these n trials and follows a binomial distribution, denoted S_n \sim \operatorname{Bin}(n, p). This distribution was first systematically derived by Jacob Bernoulli in his seminal work Ars Conjectandi.^[18] The probability mass function of the binomial distribution gives the probability of exactly k successes in n trials as

P(S_n = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \dots, n,

where \binom{n}{k} = \frac{n!}{k!(n-k)!} is the binomial coefficient, counting the number of ways to choose k successes out of n trials. This formula arises directly from the independence of the trials and the multiplicative probability structure of the process.^[18] The binomial random variable S_n has expected value E[S_n] = np and variance \operatorname{Var}(S_n) = np(1-p), which follow from the linearity of expectation and the independence of the X_i, since E[X_i] = p and \operatorname{Var}(X_i) = p(1-p). These moments quantify the average number of successes and the spread around that average, scaling linearly with the number of trials n.^[18]

Central Limit Theorem

In the Bernoulli process, consisting of independent Bernoulli trials with success probability p \in (0,1), the partial sum S_n = \sum_{i=1}^n X_i follows a binomial distribution with parameters n and p. The central limit theorem (CLT) asserts that the normalized version of this sum converges in distribution to a standard normal random variable as the number of trials grows large. Specifically,

\frac{S_n - n p}{\sqrt{n p (1-p)}} \xrightarrow{d} \mathcal{N}(0,1)

as n \to \infty, where \xrightarrow{d} denotes convergence in distribution.^[19] This result follows from the more general Lindeberg-Feller CLT for independent, identically distributed random variables with finite variance, since each X_i has mean p and variance p(1-p).^[20] A standard proof of this CLT employs characteristic functions, exploiting the independence of the trials. Consider the centered and scaled variables Y_i = \frac{X_i - p}{\sqrt{p(1-p)}}, which are i.i.d. with mean 0 and variance 1. The characteristic function of Y_i is \phi_Y(t) = \exp\left( -i t \frac{p}{\sqrt{p(1-p)}} \right) \phi\left( \frac{t}{\sqrt{p(1-p)}} \right), where \phi(t) = (1-p) + p e^{it} is the characteristic function of X_i. The characteristic function of the normalized sum Z_n = \frac{1}{\sqrt{n}} \sum_{i=1}^n Y_i is \left[ \phi_Y\left( \frac{t}{\sqrt{n}} \right) \right]^n. For small u = t / \sqrt{n}, the cumulant expansion gives \log \phi_Y(u) = -\frac{1}{2} u^2 + o(u^2), since the mean is zero. Thus, n \log \phi_Y\left( \frac{t}{\sqrt{n}} \right) \to -\frac{t^2}{2}, and by the continuity theorem for characteristic functions, Z_n \xrightarrow{d} \mathcal{N}(0,1).^[21] Alternatively, the Lindeberg condition can be verified directly, as the Bernoulli variables have bounded support and thus satisfy the necessary uniformity in their contributions to the sum's variance.^[20] This asymptotic normality has significant implications for statistical inference in the Bernoulli process. For large n, the distribution of S_n can be approximated by a normal distribution with mean n p and variance n p (1-p), enabling the construction of approximate confidence intervals for the unknown parameter p. A common (1-\alpha)-level interval is given by \hat{p} \pm z_{\alpha/2} \sqrt{ \frac{\hat{p}(1-\hat{p})}{n} }, where \hat{p} = S_n / n is the sample proportion and z_{\alpha/2} is the upper \alpha/2-quantile of the standard normal distribution; this approximation improves as n increases, provided n p (1-p) is sufficiently large (typically at least 5 or 10).^[19] The rate of convergence in the CLT for the Bernoulli process is quantified by the Berry–Esseen theorem, which provides a non-uniform bound on the Kolmogorov distance between the distribution of the normalized sum and the standard normal. For the binomial case, the error is bounded by C \frac{p^2 + (1-p)^2}{\sqrt{n p (1-p)}}, where C is an absolute constant; the optimal such constant for the binomial distribution is approximately 0.4097, yielding an overall error of order O(1/\sqrt{n}). This bound confirms that the normal approximation becomes reliable for moderately large n, with the precise rate depending on p; it is tightest when p = 1/2 and worsens near the boundaries p \approx 0 or p \approx 1.^[22]

Dynamical Systems Connections

Bernoulli Shift

The Bernoulli shift, also known as the Bernoulli map or shift transformation, is a fundamental example of a dynamical system arising from the Bernoulli process. It is defined on the sequence space \{0,1\}^\mathbb{N}, consisting of all infinite sequences \omega = (\omega_1, \omega_2, \dots) where each \omega_i \in \{0,1\}. The shift map \sigma: \{0,1\}^\mathbb{N} \to \{0,1\}^\mathbb{N} acts by \sigma(\omega_1, \omega_2, \dots) = (\omega_2, \omega_3, \dots), effectively discarding the first coordinate and shifting the sequence to the left.^[23] This transformation preserves the Bernoulli measure \mu_p, which is the infinite product measure on \{0,1\}^\mathbb{N} with marginal probability p for the coordinate being 1 (and $1-p for 0), assuming $0 < p < 1. Specifically, for any measurable set A, \mu_p(\sigma^{-1}(A)) = \mu_p(A), as the shift aligns with the independent structure of the product measure, mapping cylinder sets to equivalent-measure sets.^[23] The Bernoulli shift is ergodic with respect to \mu_p, meaning that the only invariant sets under \sigma have measure 0 or 1. Consequently, for any integrable function f on \{0,1\}^\mathbb{N}, the time average \lim_{n \to \infty} \frac{1}{n} \sum_{k=0}^{n-1} f(\sigma^k \omega) equals the space average \int f \, d\mu_p for \mu_p-almost every \omega. This ergodicity reflects the irreducibility of the dynamics, where orbits densely explore the space without fixed subsystems of positive measure.^[24]^[23] Furthermore, the Bernoulli shift exhibits the strong mixing property, a stronger form of ergodicity where \lim_{n \to \infty} \mu_p(\sigma^{-n}(A) \cap B) = \mu_p(A) \mu_p(B) for all measurable sets A, B. In this case, the correlations decay exponentially: there exists \gamma \in (0,1) such that |\mu_p(\sigma^{-n}(A) \cap B) - \mu_p(A) \mu_p(B)| \leq C \gamma^n for suitable constants C, depending on the sets. This rapid decorrelation underscores the chaotic, independent nature of the underlying Bernoulli trials.^[24]^[25]

Symbolic Dynamics Examples

One prominent example of a Bernoulli shift in symbolic dynamics is the doubling map on the unit interval, defined by T(x) = 2x \mod 1 for x \in [0,1). This map is topologically conjugate to the one-sided Bernoulli shift on the space of binary sequences via the binary expansion representation x = 0.\omega_1 \omega_2 \omega_3 \dots, where each \omega_i \in \{0,1\}, and the conjugacy is given by the map h: \{0,1\}^\mathbb{N} \to [0,1) that sends a sequence to its corresponding real number, with the shift \sigma on sequences corresponding to the action of T on expansions (noting that expansions are unique except on a countable set of dyadic rationals).^[26] The Lebesgue measure on [0,1) serves as the invariant measure for the doubling map T, and it is equivalent to the Bernoulli measure \mu_{1/2}, which is the infinite product of uniform Bernoulli trials with probability 1/2 on {0,1}. This equivalence preserves the ergodic properties of the system, such as mixing, under the conjugacy.^[26] Another key example arises with the middle-thirds Cantor set, which serves as the support for a Bernoulli measure with equal probabilities p=1/2. Points in the Cantor set \mathcal{C} \subset [0,1] admit ternary expansions using only digits 0 and 2, i.e., x = \sum_{k=1}^\infty a_k / 3^k where each a_k \in \{0,2\}, and the shift map acts by deleting the first digit and shifting the remaining sequence, inducing a homeomorphism between \mathcal{C} and the symbolic space {0,2}^\mathbb{N}. The corresponding Bernoulli measure on \mathcal{C} is the pushforward of \mu_{1/2} under this representation, making the system measure-theoretically isomorphic to the binary Bernoulli shift.^[26]

Odometer Transformation

The odometer transformation, also known as the adding machine, is defined on the p-adic integers X = \mathbb{Z}_p = \lim_{\leftarrow k} \mathbb{Z}/p^k \mathbb{Z} for a fixed prime p, a compact abelian group under p-adic addition. Equivalently, elements of X can be represented as formal power series x = \sum_{k=0}^\infty d_k p^k with digits d_k \in \{0, 1, \dots, p-1\}, topologized as the product space \{0,1,\dots,p-1\}^\mathbb{N}. The transformation T: X \to X is given by adding 1 (translation by the generator 1), which in the digit representation increments the least significant digit d_0 and propagates the carry to higher digits as needed (e.g., if d_0 = p-1, set d_0 = 0 and add 1 to d_1, etc.), mimicking the rollover of a mechanical odometer.^[27] This map preserves the normalized Haar measure \mu on X, the unique translation-invariant probability measure, which in the digit representation corresponds to the infinite product of uniform measures on \{0,1,\dots,p-1\} (each with probability $1/p), equivalent to the i.i.d. product measure on the digits.^[27] Unlike the one-sided Bernoulli shift on (\mathbb{Z}/p\mathbb{Z})^\mathbb{N}, which is a non-invertible left-shift map, the odometer T is bijective because addition by 1 on the compact group \mathbb{Z}_p admits an inverse (addition by -1), making it an automorphism of the measure space (X, \mu).^[28] The dynamical system (X, \mu, T) is a rank-one transformation, constructed via cutting and stacking with constant cutting parameters r_n = p and no spacers, ensuring it is mildly mixing but not strongly mixing.^[28] Topologically, T is minimal on X, meaning every orbit is dense, and it admits a unique invariant probability measure \mu, rendering the system strictly ergodic.^[29] In ergodic theory, the odometer serves as a canonical example of a strictly ergodic \mathbb{[Z](/page/Z)}-action on a compact group via translation, with discrete rational spectrum consisting of all p-th roots of unity (eigenvalues e^{2\pi i m / p^k} for integers m, k) and zero entropy, contrasting with the positive-entropy, Lebesgue-spectrum Bernoulli shifts while sharing the underlying product measure structure.^[30] This makes it useful for studying spectral invariants and factors in classifications of measure-preserving transformations, such as in constructions of prime systems or extensions with odometer factors.^[31]

Applications and Extensions

Bernoulli Sequences

A Bernoulli sequence is defined as a realization or sample path \omega \in \{0,1\}^\mathbb{N} drawn from the Bernoulli measure \mu_p on the space of infinite binary sequences, where each coordinate \omega_n is independently 1 with probability p \in (0,1) and 0 with probability $1-p.^[32] This measure assigns to any finite cylinder set based on the initial segment \sigma of length k the probability \mu_p([\sigma]) = p^{|\sigma|_1} (1-p)^{|\sigma|_0}, where |\sigma|_1 and |\sigma|_0 denote the number of 1s and 0s in \sigma, respectively.^[32] Almost all Bernoulli sequences with respect to \mu_p exhibit strong randomness properties. By the strong law of large numbers, the empirical frequency of 1s converges almost surely to p, implying that for p = 1/2, the sequences are normal in the sense of having asymptotically equal frequencies of 0s and 1s.^[33] Moreover, these sequences are almost surely non-periodic, as any periodic structure would contradict the independence and identical distribution of the coordinates under \mu_p.^[32] They are also incompressible, meaning their initial segments cannot be significantly shortened by any algorithmic description, a consequence of satisfying all \mu_p-Martin-Löf tests for randomness.^[32] In contrast to deterministic sequences, Bernoulli sequences lack the structured repetition found in periodic binary strings or de Bruijn sequences, which are finite or cyclic constructs designed to include every possible substring of a given length exactly once.^[34] While de Bruijn sequences serve applications in coding and testing due to their exhaustive coverage, Bernoulli sequences, being probabilistic realizations, avoid such predictability and instead embody the inherent irregularity of independent trials.^[34] The term "Bernoulli sequence" is sometimes used interchangeably with the Bernoulli process itself, referring to the abstract sequence of random variables, but in this context, it specifically emphasizes the pathwise properties of individual realizations \omega.^[32]

Randomness Extraction

In biased Bernoulli processes, where the success probability p \neq 0.5, the generated bits deviate from uniformity, posing challenges for applications that demand high-quality randomness, such as cryptographic key generation and stochastic simulations.^[35] Randomness extraction techniques address this by transforming sequences of biased bits into nearly uniform random bits, leveraging the underlying independence of the trials while mitigating bias.^[36] The seminal Von Neumann extractor, introduced in 1951, provides a simple debiasing method for independent Bernoulli trials. It pairs consecutive bits (X_{2i-1}, X_{2i}): if the pair is (1,0), it outputs 1; if (0,1), it outputs 0; pairs (0,0) and (1,1) are discarded. This yields independent uniform bits (each 0 or 1 with probability 1/2), regardless of p (assuming $0 < p < 1), because the conditional probabilities balance out: P(\text{output 1} \mid \text{output}) = \frac{p(1-p)}{2p(1-p)} = \frac{1}{2}. The probability of producing an output per pair is $2p(1-p), yielding an efficiency of p(1-p) output bits per input bit.^[37] To enhance efficiency, especially for long sequences, iterated extensions apply the extractor recursively on discarded bits. Elias's 1972 block-based approach processes variable-length runs of identical bits, encoding them to extract more bits per block and approaching the source entropy rate h(p) = -p \log_2 p - (1-p) \log_2 (1-p) for large blocks. Peres's 1992 method builds on this by iteratively applying Von Neumann extraction in a hierarchical manner, reusing bits from previous discards to achieve asymptotically optimal rates near h(p) as iterations increase. These iterations improve output yield, for instance, approaching 1 bit per input when p = 0.5 asymptotically with increasing iterations, compared to the basic extractor's 0.25.^[38] The basic Von Neumann extractor's efficiency p(1-p) \leq 1/4 limits its practicality when bias is strong (p near 0 or 1), as most pairs are discarded.^[39] More advanced alternatives, such as those employing arithmetic coding to model the source distribution, can extract closer to the full entropy but require knowledge of p or adaptive estimation.^[40] In modern cryptographic contexts post-2000, such as quantum random number generators and privacy amplification, sophisticated extractors (e.g., seeded variants) have superseded basic methods for robust, high-entropy output in adversarial settings.^[41]^[36]

References

[1]
[PDF] The Bernoulli process and discrete distributions Math 217 ...
The definition. A single trial for a Bernoulli pro- cess, called a Bernoulli trial, ends with one of two outcomes, one often called success, the other ...
[2]
https://stats.libretexts.org/Bookshelves/Probability_Theory/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)/11%3A_Bernoulli_Trials/11.01%3A_Introduction_to_Bernoulli_Trials
[3]
10. Bernoulli Trials - Random Services
The process leads to several important probability distributions: the binomial, geometric, and negative binomial. ... The Beta-Bernoulli Process. Apps. Coin ...
[4]
Tales of Statisticians | Jacob Bernoulli
He published papers on algebra and probability in 1685 and on geometry in 1687, and was appointed to the chair of mathematics at Basel in 1687. At about this ...Missing: process original
[5]
[PDF] 6.436J Lecture 20: The Bernoulli and Poisson processes
Nov 19, 2008 · THE BERNOULLI PROCESS. In the Bernoulli process, the random variables Xn are i.i.d. Bernoulli, with com. mon parameter p ∈ (0, 1).
[6]
[PDF] Lecture 9: Commonly Used Distributions - Columbia University
◦ Quality control: Defective/Good. ◦ Clinical ... Let X be the number of successes in a Bernoulli process with n trials and probability of success p.
[7]
[PDF] some basic probabilistic processes
The following plot presents pk(ko) for a Bernoulli process, with P = + and n = 4. 4-2 Interarrival Times for the Bernoulli Process ... customer whose purchase ...
[8]
[PDF] Entropy and Information Theory - Stanford Electrical Engineering
Jun 3, 2023 · ... Bernoulli process if it can be defined as a stationary coding of an independent identically distributed (i.i.d.) process. Let µ denote the ...
[9]
[PDF] Probability: Theory and Examples Rick Durrett Version 5 January 11 ...
Jan 11, 2019 · ... Probability Spaces. Here and throughout the book, terms being defined are set in boldface. We begin with the most basic quantity. A probability ...
[10]
[PDF] Probability and Measure - University of Colorado Boulder
This book covers probability measures, Lebesgue measure, simple random variables, laws of large numbers, Markov chains, and measure theory.
[11]
[PDF] The Basics of Stochastic Processes - MIT OpenCourseWare
In the Bernoulli process, the random variables Xn are i.i.d. Bernoulli, with com- mon parameter p ∈ (0, 1). The natural sample space in this case is = {0, 1}∞ .<|control11|><|separator|>
[12]
[PDF] Overview 1 Probability spaces - UChicago Math
Mar 21, 2016 · This is an introduction to the mathematical foundations of probability theory. It is intended as a supplement.
[13]
[PDF] An Introduction to Measure Theory - Terry Tao
Product measure. Given two sets X and Y , one can form their Cartesian ... Informally, Bernoulli measure al- lows one to model an infinite number of ...<|control11|><|separator|>
[14]
[PDF] Measure-Theoretic Probability I
Let B≠ be the æ°algebra generated by the cylinder sets. The pair (≠,B≠) will be refereed to as sequence space. Exercise 1.7. Define a metric on ≠ by setting ...<|control11|><|separator|>
[15]
[PDF] Jakob Bernoulli On the Law of Large Numbers Translated into ...
His Ars Conjectandi (1713) (AC) was published posthumously with a Foreword by his nephew, Niklaus Bernoulli (English translation: David (1962, pp. 133 – 135); ...Missing: citation | Show results with:citation
[16]
[PDF] Bernoulli and Binomial Random Variables
Jul 10, 2017 · A Bernoulli random variable is the simplest kind of random variable. It can take on two values,. 1 and 0. It takes on a 1 if an experiment with ...<|control11|><|separator|>
[17]
254A, Notes 2: The central limit theorem | What's new - Terry Tao
Jan 5, 2010 · The central limit theorem (and its variants, which we discuss below) are extremely useful tools in random matrix theory, in particular through the control they ...
[18]
[PDF] Two Proofs of the Central Limit Theorem
For a Bernoulli random variable, it is very simple: MBer(p) = (1 − p) + pet =1+(et − 1)p. A binomial random variable is just the sum of many Bernoulli variables ...
[19]
[PDF] On a bound of the absolute constant in the Berry–Esseen inequality ...
Sep 14, 2018 · The absolute constant in the Berry-Esseen inequality for iid Bernoulli variables is strictly less than the Esseen constant, with a bound of C02 ...
[20]
[PDF] shifts and ergodic theory - UChicago Math
We define Bernoulli schemes by formally construct- ing the product space given a base space x. We define measure-preserving and ergodic maps, and show that ...
[21]
The Ergodic Hierarchy - Stanford Encyclopedia of Philosophy
Apr 13, 2011 · It is a hierarchy of properties that dynamical systems can possess. Its five levels are ergodicity, weak mixing, strong mixing, Kolmogorov, and Bernoulli.
[22]
[PDF] Entropy in Dynamical Systems & Ergodic Theory (A little Glimpse)
Every Bernoulli shift is mixing. Markov shifts (a.k.a. finite-state space Markov chains) : Consider an irreducible stochastic matrix Q : {1, ...
[23]
Introduction to the Modern Theory of Dynamical Systems
This book provided the first self-contained comprehensive exposition of the theory of dynamical systems as a core mathematical discipline closely ...
[24]
[PDF] Singular p-adic transformations for Bernoulli product measures
Feb 21, 2014 · Although we have defined the p-adic integers as formal power series, we can identify certain series as natural integers or rational numbers.
[25]
[PDF] Ergodic Theory of Group Actions - Mathematics
Apr 6, 2016 · The Bernoulli shift for (S, P) is the transformation T : (X, µ) ... Bernoulli shifts on nonatomic probability spaces are mixing. Proof ...
[26]
[PDF] Minimity and unique ergodicity for adic transformations - IME-USP
Dec 28, 2008 · Moreover this can be constructed so as to be strictly ergodic: to be both minimal (every orbit is dense) and uniquely ergodic. (that there is a ...<|control11|><|separator|>
[27]
[PDF] The structure and spectrum of Heisenberg odometers
They are rank one transformations, and therefore are ergodic and have zero entropy. They have discrete rational spectrum and are the key ingredients in the ...
[28]
[PDF] a prime system with many self-joinings - Northwestern University
In Section 2, we introduce general concepts from ergodic theory. In Section 3, we define our system (Y,ν,T), as the first return map of an odometer to a compact ...
[29]
[PDF] Bernoulli Randomness and Biased Normality - arXiv
Nov 26, 2020 · Let nω denote the set of infinite n-ary sequences where n is a base. We identify n<ω as the set of finite n-ary sequences, which we also call ...
[30]
A Tricentenary history of the Law of Large Numbers - Project Euclid
The Weak Law of Large Numbers is traced chronologically from its inception as Jacob Bernoulli's Theorem in 1713, through De Moivre's Theorem, ...
[31]
[PDF] Evolutionary Construction of de Bruijn Sequences
A binary de Bruijn sequence of order n is a cyclic sequence of period 2n, in which each n-bit pattern appears exactly once. De Bruijn sequences are balanced, ...
[32]
[PDF] Randomness Extractors
Randomness extractors are functions that extract almost-uniform bits from biased and correlated sources, like low-order bits of a system clock.
[33]
[PDF] An introduction to randomness extractors
The notion of deterministic extractors can be traced back to von-Neumann. [vN51] who considered sequences of independent tosses of a biassed coin with unknown ...
[34]
Proof of Von Neumann's debiasing algorithm - MathOverflow
Dec 17, 2013 · John von Neumann describes an algorithm to debias the random source and output a perfectly unbiased sequence of 1s and 0s as follows:
[35]
(PDF) Iterating Von Neumann's Procedure for Extracting Random Bits
Aug 6, 2025 · Given a sequence of independent, identically distributed random biased bits, von Neumann's simple procedure extracts independent unbiased bits.
[36]
[PDF] A generalization of the Von Neumann extractor - arXiv
Jan 7, 2021 · The number of bi- ased Bernoulli random variables needed to produce one unbiased instance is the complexity of interest. The complexity depends ...
[37]
Randomness extractors: making fair coins out of biased coins
May 26, 2024 · I discuss various models of biased bit sequences, and how to extract uniform random (or close to it) output bit sequences from them, ...Missing: modern cryptography
[38]
Recent Advances in Randomness Extraction - PMC - PubMed Central
Jun 26, 2022 · The area of randomness extraction has seen interesting advances in recent years, with rapid progress on many longstanding open problems.