Random sequence
A random sequence, in the context of probability theory, is a discrete-time stochastic process consisting of a countable collection of random variables \{X_n\}_{n \in J}, where J is a countable index set such as the natural numbers \mathbb{N} or integers \mathbb{Z}, each defined on a common probability space.[1] These sequences model phenomena where outcomes at discrete points (e.g., time steps) are subject to probabilistic variation, such as daily stock returns or successive measurements in an experiment.[2] In contrast, within algorithmic information theory, a random sequence refers to an individual string or infinite sequence that lacks compressible patterns, quantified by its Kolmogorov complexity—the length of the shortest computer program capable of producing it—being approximately equal to the sequence's own length, indicating incompressibility.[3] In probability theory, random sequences are foundational to analyzing asymptotic behaviors and statistical inference. Key properties include stationarity, where the joint distribution of variables remains invariant under time shifts, enabling predictions in time-series models like ARIMA processes. They also exhibit convergence modes, such as convergence in probability (where P(|X_n - X| > \epsilon) \to 0 as n \to \infty) or almost sure convergence, underpinning theorems like the law of large numbers (sample averages converging to the expected value) and the central limit theorem (normalized sums approaching a normal distribution). Examples include independent and identically distributed (i.i.d.) sequences, like coin toss outcomes, or dependent ones modeling Markov chains, where the next state depends only on the current one. The concept of Kolmogorov complexity was introduced by Ray Solomonoff in 1960 and independently developed by Andrey Kolmogorov and Gregory Chaitin in 1965; this measure captures that random sequences pass all effective statistical tests, such as uniform frequency of symbols or no simple patterns in subsequences.[4] Per Martin-Löf's 1966 refinement, an infinite binary sequence is random if it avoids all "null" sets defined by computable martingales or measures of zero probability under the uniform distribution, ensuring it satisfies laws like the law of large numbers almost surely.[5] This notion contrasts with pseudorandom sequences in computing, which are deterministically generated to mimic randomness for applications like cryptography but are compressible by design.[6] Both interpretations intersect in applications: probabilistic random sequences inform simulations that approximate algorithmic randomness, while incompressibility tests validate pseudorandom generators in Monte Carlo methods. The study of random sequences bridges pure mathematics, statistics, and computer science, influencing fields from quantum mechanics to machine learning.Fundamentals
Definition
A random sequence is a sequence \{X_n\}_{n \in \mathbb{N}} (or more generally \{X_n\}_{n \in \mathbb{Z}}), where each X_n is a random variable defined on a common probability space (\Omega, \mathcal{F}, P).[1] This setup models uncertainty in a sequence of outcomes, where the value of each X_n is not fixed but follows a probabilistic law.[7] The underlying probability space consists of three key components: the sample space \Omega, which is the set of all possible outcomes of the underlying experiment; the \sigma-algebra \mathcal{F}, a collection of subsets of \Omega (called events) that is closed under complementation and countable unions; and the probability measure P, a function from \mathcal{F} to [0,1] satisfying P(\Omega) = 1 and additivity over disjoint events.[8] Each random variable X_n: \Omega \to \mathbb{R} (or to another measurable space) is measurable with respect to \mathcal{F}, ensuring that events like \{X_n \leq x\} belong to \mathcal{F} for all real x.[7] Random sequences are a special case of stochastic processes, specifically discrete-time stochastic processes where the index set is the natural numbers or integers, as opposed to continuous-time processes indexed by real numbers.[1] Unlike deterministic sequences, in which each term is a fixed value with no associated uncertainty, random sequences incorporate probabilistic variability, making outcomes unpredictable without knowledge of the realization from \Omega.[9] Examples of random sequences include the Bernoulli sequence, where each X_n is a Bernoulli random variable taking value 1 with probability p \in (0,1) and 0 otherwise, often modeling repeated independent trials like coin flips. Another example is the uniform random sequence over [0,1], where each X_n follows a uniform distribution with probability density function f(x) = 1 for x \in [0,1] and 0 elsewhere, useful for generating continuous uniform variates.[10]Basic Properties
A random sequence \{X_n\}_{n=1}^\infty consists of random variables, each with its own expectation \mathbb{E}[X_n] = \mu_n and variance \mathrm{Var}(X_n) = \sigma_n^2, assuming finite second moments. These moments characterize the central tendency and dispersion of each term in the sequence. For sequences where the expectations converge, \lim_{n \to \infty} \mathbb{E}[X_n] = \mu exists, providing a long-run average behavior, while the variances indicate the variability at each step.[11] For independent and identically distributed (i.i.d.) sequences with finite mean \mu, the law of large numbers (LLN) asserts that the sample average converges to the expected value. The weak LLN states that \frac{1}{n} \sum_{i=1}^n X_i \to \mu in probability as n \to \infty, originally formulated by Bernoulli in 1713 and generalized in modern form. The strong LLN, established by Kolmogorov in 1933, strengthens this to almost sure convergence: \frac{1}{n} \sum_{i=1}^n X_i \to \mu with probability 1. These laws underpin the reliability of empirical averages in probabilistic modeling. Under similar i.i.d. conditions with finite variance \sigma^2 > 0, the central limit theorem (CLT) describes the asymptotic normality of the standardized sum. Specifically, \sqrt{n} \left( \frac{1}{n} \sum_{i=1}^n X_i - \mu \right) \to \mathcal{N}(0, \sigma^2) in distribution as n \to \infty, a result due to Lindeberg in 1922 for general cases and refined by Lévy. This convergence enables approximations of sums by normal distributions, facilitating inference in large samples.[12] Stationarity is a key property ensuring temporal invariance in statistical features. A sequence is strictly stationary if the joint distribution of (X_{t_1}, \dots, X_{t_k}) equals that of (X_{t_1 + h}, \dots, X_{t_k + h}) for any shift h and times t_1, \dots, t_k, as defined by Khintchine in 1934. Weak stationarity, or second-order stationarity, is a milder condition requiring constant mean \mathbb{E}[X_t] = \mu, finite constant variance \mathrm{Var}(X_t) = \sigma^2, and covariance depending only on lag \mathrm{Cov}(X_t, X_{t+h}) = \gamma(h). Strict stationarity implies weak stationarity when moments exist, but the converse does not hold.[13] Ergodicity extends stationarity by equating time averages to ensemble averages. A stationary sequence is ergodic if, for integrable functions f, the time average \frac{1}{n} \sum_{k=0}^{n-1} f(T^k X) converges almost surely to \mathbb{E}[f(X)], where T is the shift operator, per Birkhoff's 1931 theorem. This property ensures that a single realization suffices to estimate population parameters.Types of Random Sequences
Independent and Identically Distributed Sequences
An independent and identically distributed (i.i.d.) sequence is a sequence of random variables \{X_n\}_{n=1}^\infty where the random variables X_i and X_j are independent for all i \neq j, and each X_i shares the same probability distribution F.[14] This structure implies that the joint cumulative distribution function of the first n variables takes the form of a product measure: P(X_1 \leq x_1, \dots, X_n \leq x_n) = \prod_{i=1}^n F(x_i). [15] Common examples of i.i.d. sequences include the outcomes of successive fair coin flips, which follow a Bernoulli distribution with parameter p = 1/2, and sequences of standard normal random variables, each distributed as \mathcal{N}(0,1).[16] A key asymptotic property unique to i.i.d. sequences with finite expectation \mu = E[X_i] is Kolmogorov's strong law of large numbers, which states that the sample average converges almost surely to the mean: \frac{1}{n} \sum_{i=1}^n X_i \to \mu \quad \text{almost surely as } n \to \infty. This result holds under the sole condition of identical finite first moments and independence, without requiring higher-moment assumptions.[17] For the central limit theorem applied to i.i.d. sequences with finite variance \sigma^2 > 0, the Berry–Esseen theorem provides a quantitative bound on the rate of convergence of the standardized sum to the standard normal distribution in the Kolmogorov distance: \sup_x \left| P\left( \frac{\sum_{i=1}^n X_i - n\mu}{\sigma \sqrt{n}} \leq x \right) - \Phi(x) \right| \leq C \frac{\rho}{\sqrt{n}}, where \Phi is the standard normal CDF, \rho = E[|X_1 - \mu|^3]/\sigma^3 < \infty, and C is a universal constant (e.g., C \approx 0.4785). This bound highlights the \mathcal{O}(1/\sqrt{n}) convergence rate, improving on weaker qualitative statements.[18]Dependent Sequences
In probability theory, a random sequence exhibits dependence if the joint distribution of its random variables cannot be factored into the product of their marginal distributions, meaning that the occurrence of one variable affects the probabilities of others.[19] This contrasts with independent sequences, where knowledge of past values provides no information about future ones. A prominent class of dependent sequences is Markov sequences, also known as Markov chains, defined for a discrete-time stochastic process \{X_n\}_{n=0}^\infty taking values in a state space S such that the conditional distribution of X_{n+1} given the entire history X_1, \dots, X_n depends only on the most recent state X_n:P(X_{n+1} = j \mid X_1 = i_1, \dots, X_n = i_n) = P(X_{n+1} = j \mid X_n = i_n)
for all n \geq 1 and states i_1, \dots, i_n, j \in S.[20] The one-step transition probabilities p_{ij} = P(X_{n+1} = j \mid X_n = i) form the entries of the transition matrix P = (p_{ij}), a stochastic matrix where each row sums to 1, capturing the memory-one structure of the dependence.[21] Classic examples illustrate this dependence. The simple random walk on the integers \mathbb{Z} is a Markov chain where X_{n+1} = X_n + Z_{n+1}, with \{Z_i\} independent and identically distributed steps (e.g., Z_i = +1 or -1 with equal probability 1/2), so the position depends solely on the previous position and an independent increment.[22] Another example is the autoregressive process of order 1 (AR(1)), defined by X_n = \phi X_{n-1} + \varepsilon_n for |\phi| < 1 and \{\varepsilon_n\} independent white noise with mean zero, where each term linearly depends on the immediate predecessor, inducing short-term memory in the sequence.[23] Stationarity in dependent sequences like Markov chains refers to a probability distribution \pi = (\pi_i)_{i \in S} over the states that remains invariant under the transition dynamics, satisfying the balance equation \pi = \pi P, or componentwise \pi_j = \sum_{i \in S} \pi_i p_{ij} for all j.[24] For irreducible and aperiodic chains, such a stationary distribution exists and is unique, representing the long-run proportion of time spent in each state.[25] To ensure asymptotic properties such as central limit theorems hold despite dependence, mixing conditions quantify the decay of correlations over time lags. A sequence is \alpha-mixing (strong mixing) if the supremum over events A in the past \sigma-field and B in the future \sigma-field of |\mathbb{P}(A \cap B) - \mathbb{P}(A)\mathbb{P}(B)| tends to zero as the lag increases, measuring maximal dependence.[26] Stronger is \beta-mixing (absolute regularity), where the total variation distance between the joint and product measures of past and future \sigma-fields vanishes, implying \alpha-mixing and enabling rates of convergence in statistical inference for dependent data.[27]