Fact-checked by Grok 2 weeks ago

Geometric distribution

The geometric distribution is a discrete probability distribution that models the number of independent Bernoulli trials required to achieve the first success, where each trial has a constant success probability p (with $0 < p \leq 1).^[1] It arises in scenarios involving repeated independent experiments with binary outcomes, such as success or failure, and is fundamental in probability theory for analyzing waiting times until an event occurs.^[2] There are two common parameterizations of the geometric distribution, differing in whether the random variable counts the total number of trials until the first success (with support X = 1, 2, 3, \dots) or the number of failures preceding the first success (with support Y = 0, 1, 2, \dots).^[3] For the trials-until-success version, the probability mass function is given by
P(X = k) = (1 - p)^{k-1} p, \quad k = 1, 2, 3, \dots
while for the failures-before-success version, it is
P(Y = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots. ^[2]^[3] The expected value is E[X] = \frac{1}{p} for the former and E[Y] = \frac{1-p}{p} for the latter, with both sharing the variance \frac{1-p}{p^2}.^[2]^[3] A defining feature of the geometric distribution is its memoryless property, which states that the probability of requiring additional trials beyond a certain point is independent of the trials already conducted: P(X > s + t \mid X > s) = P(X > t) for non-negative integers s and t.^[3] This property uniquely characterizes the geometric distribution among discrete distributions on the non-negative integers and parallels the exponential distribution in continuous settings.^[3] The distribution also serves as the special case r = 1 of the negative binomial distribution, which generalizes it to the number of trials until the r-th success.^[4] Applications of the geometric distribution are widespread in fields requiring modeling of waiting times or trial counts until an event, such as reliability engineering (e.g., time until an engine failure in independent tests), quality control (e.g., inspections until a defective item is found), and telecommunications (e.g., packet retransmissions until successful delivery).^[5]^[6]^[7] It is also used in ecology for modeling runs of species occurrences and in computer science for analyzing algorithm performance in randomized settings.^[7]^[6]

Definition

Parameterizations

The geometric distribution arises in the context of independent Bernoulli trials, each with success probability p where $0 < p \leq 1.^[8] One standard parameterization defines the random variable X as the number of failures preceding the first success, with possible values in the set \{0, 1, 2, \dots \}.^[8] This formulation, often considered primary in mathematical probability, directly corresponds to the terms of a geometric series indexed from 0.^[8] An alternative parameterization specifies the random variable Y as the total number of trials required to achieve the first success, taking values in \{1, 2, 3, \dots \}.^[9] Here, Y = X + 1, linking the two definitions.^[10] Both parameterizations remain in common use due to contextual preferences: the failures version suits derivations involving infinite series and theoretical probability, while the trials version is favored in applied statistics for representing waiting times or experiment counts.^[8]^[9] The probability of failure on each trial is conventionally denoted q = 1 - p.^[8]

Probability Mass Function

The geometric distribution can be parameterized in terms of the number of failures X before the first success in a sequence of independent Bernoulli trials, each with success probability p where $0 < p \leq 1. The probability mass function (PMF) for this parameterization is given by

P(X = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots

^[8] This PMF is verified to be a valid probability distribution, as the infinite sum over the support equals 1:

\sum_{k=0}^{\infty} P(X = k) = p \sum_{k=0}^{\infty} (1 - p)^k = p \cdot \frac{1}{1 - (1 - p)} = 1,

using the formula for the sum of an infinite geometric series with common ratio |1 - p| < 1.^[8] An alternative parameterization models the number of trials Y until the first success, also based on independent Bernoulli trials with success probability p. The corresponding PMF is

P(Y = k) = (1 - p)^{k-1} p, \quad k = 1, 2, 3, \dots

^[11] This PMF similarly sums to 1 over its support:

\sum_{k=1}^{\infty} P(Y = k) = p \sum_{k=1}^{\infty} (1 - p)^{k-1} = p \sum_{j=0}^{\infty} (1 - p)^j = p \cdot \frac{1}{1 - (1 - p)} = 1,

again applying the infinite geometric series sum.^[11] The random variables X and Y are related by Y = X + 1, which induces a probability shift such that P(Y = k) = P(X = k - 1) for k = 1, 2, \dots, accounting for the difference in their supports and the exponent adjustment in the PMF.^[11]

Cumulative Distribution Function

The cumulative distribution function (CDF) of a geometric random variable provides the probability that the number of failures or trials until the first success is at most a given value. There are two common parameterizations of the geometric distribution, which differ in whether the random variable counts the number of failures before the first success (starting from 0) or the number of trials until the first success (starting from 1).^[3] Consider first the parameterization where X denotes the number of failures before the first success in a sequence of independent Bernoulli trials, each with success probability p where $0 < p < 1. The CDF is given by

F_X(k) = P(X \leq k) = 1 - (1 - p)^{k+1}, \quad k = 0, 1, 2, \dots

This closed-form expression is obtained by summing the probability mass function (PMF) from 0 to k, which forms a finite geometric series.^[3]^[12] For the alternative parameterization where Y represents the trial number on which the first success occurs, the support begins at 1, and the CDF is

F_Y(k) = P(Y \leq k) = 1 - (1 - p)^k, \quad k = 1, 2, 3, \dots

Similarly, this follows from summing the corresponding PMF from 1 to k.^[12]^[13] In both cases, the CDF approaches 1 as k \to \infty, since |1 - p| < 1, ensuring that success eventually occurs with probability 1. The survival function, S(k) = 1 - F(k), is then S_X(k) = (1 - p)^{k+1} for the failures parameterization and S_Y(k) = (1 - p)^k for the trials parameterization, representing the probability that no success has occurred by trial k.^[12]^[3] These CDFs interpret the cumulative probability of the first success happening by the k-th trial (or after at most k failures), which is fundamental for modeling waiting times in discrete processes.^[13]

Properties

Memorylessness

The memoryless property of the geometric distribution states that, for a random variable X representing the number of trials until the first success with success probability p (where $0 < p < 1), the conditional probability P(X > s + t \mid X > s) = P(X > t) holds for all nonnegative integers s and t.^[14] This means that the probability of requiring more than t additional trials after already observing s failures remains unchanged from the original probability of exceeding t trials from the start. To prove this, consider the survival function derived from the cumulative distribution function (CDF). For this parameterization, P(X > k) = (1 - p)^k for k = 0, 1, 2, \dots. Thus,

P(X > s + t \mid X > s) = \frac{P(X > s + t)}{P(X > s)} = \frac{(1 - p)^{s + t}}{(1 - p)^s} = (1 - p)^t = P(X > t).

^[15] This equality demonstrates that past outcomes do not influence future probabilities. The property implies that the distribution of the remaining number of trials (or failures) is identical to the original distribution, regardless of the number of prior failures observed, reflecting a lack of "aging" or dependence on history in the process. As the sole discrete distribution exhibiting memorylessness, the geometric distribution serves as the discrete analogue to the exponential distribution's continuous memoryless property.^[8]

Moments and Cumulants

The expected value of the geometric random variable X, representing the number of failures before the first success in a sequence of independent Bernoulli trials with success probability p, is given by E[X] = \frac{1-p}{p}.^[8] This can be derived directly from the probability mass function P(X = k) = p (1-p)^k for k = 0, 1, 2, \dots, yielding E[X] = \sum_{k=0}^{\infty} k \, p (1-p)^k = p (1-p) \sum_{k=1}^{\infty} k (1-p)^{k-1} = \frac{1-p}{p}, where the sum is evaluated using the formula for the expected value of a geometric series.^[8] Alternatively, leveraging the memoryless property of the geometric distribution, the tail sum formula provides E[X] = \sum_{k=0}^{\infty} P(X > k), where P(X > k) = (1-p)^{k+1}, so E[X] = \sum_{k=0}^{\infty} (1-p)^{k+1} = \frac{1-p}{p}.^[16] The variance is \operatorname{Var}(X) = \frac{1-p}{p^2}.^[8] To derive this, first compute the second factorial moment E[X(X-1)] = \sum_{k=0}^{\infty} k(k-1) \, p (1-p)^k = 2p (1-p)^2 \sum_{k=2}^{\infty} (1-p)^{k-2} = \frac{2(1-p)^2}{p^2}.^[8] Then, \operatorname{Var}(X) = E[X(X-1)] + E[X] - (E[X])^2 = \frac{2(1-p)^2}{p^2} + \frac{1-p}{p} - \left( \frac{1-p}{p} \right)^2 = \frac{1-p}{p^2}.^[8] In the alternative parameterization where Y = X + 1 denotes the number of trials until the first success, the expected value is E[Y] = \frac{1}{p} and the variance is \operatorname{Var}(Y) = \frac{1-p}{p^2}.^[12] The skewness of the geometric distribution (in the failures parameterization) is \frac{2 - p}{\sqrt{1 - p}}, measuring the asymmetry toward positive values, which increases as p decreases./11%3A_Bernoulli_Trials/11.03%3A_The_Geometric_Distribution) The (excess) kurtosis is $6 + \frac{p^2}{1 - p}, indicating heavier tails than the normal distribution, particularly for small p./11%3A_Bernoulli_Trials/11.03%3A_The_Geometric_Distribution) The cumulants of the geometric distribution satisfy \kappa_1 = \frac{1-p}{p}, \kappa_2 = \frac{1-p}{p^2}, and \kappa_n = (n-1)! \frac{1-p}{p^n} for n \geq 2.^[8] These follow from the cumulant-generating function K(t) = \log \left( \frac{p}{1 - (1-p) e^t} \right), obtained as the logarithm of the moment-generating function.^[8]

Summary Statistics

The geometric distribution is characterized by its success probability parameter p \in (0,1), with two common parameterizations: one counting the number of failures before the first success (X = 0, 1, 2, \dots, PMF P(X = k) = p (1-p)^k) and the other counting the number of trials until the first success (Y = 1, 2, 3, \dots, PMF P(Y = k) = p (1-p)^{k-1}). Note that Y = X + 1, so the distributions share the same variance, skewness, and excess kurtosis, but differ in mean, mode, and median by a shift of 1.^[17]^[18] The following table summarizes key statistics for both parameterizations, where q = 1 - p.

Statistic	Failures (X)	Trials (Y)
Mean	\frac{q}{p}	\frac{1}{p}
Variance	\frac{q}{p^2}	\frac{q}{p^2}
Standard Deviation	\frac{\sqrt{q}}{p}	\frac{\sqrt{q}}{p}
Skewness	\frac{2 - p}{\sqrt{q}}	\frac{2 - p}{\sqrt{q}}
Excess Kurtosis	$6 + \frac{p^2}{q}	$6 + \frac{p^2}{q}
Mode	0	1
Median	\left\lceil \frac{\ln 0.5}{\ln q} \right\rceil - 1	\left\lceil \frac{\ln 0.5}{\ln q} \right\rceil

The probability generating function is G_X(s) = \frac{p}{1 - q s} for the failures parameterization (with |s| < 1/q) and G_Y(s) = \frac{p s}{1 - q s} for the trials parameterization.^[17]^[18] As p \to 1, the distribution concentrates near the mode (0 for failures, 1 for trials), with mean, variance, skewness, and excess kurtosis approaching 0, reflecting minimal spread. As p \to 0, the distribution develops a long right tail, with mean and variance diverging to infinity, and skewness and excess kurtosis increasing, indicating greater asymmetry and heavy tails.^[17]

Information Measures

Entropy

The Shannon entropy of a discrete random variable quantifies the average uncertainty or information content in its outcomes, defined as H(X) = -\sum_k P(X=k) \log_2 P(X=k), measured in bits. For the geometric distribution, which is discrete, the differential entropy approximation does not apply; instead, the focus is on this discrete Shannon entropy.^[19] Consider the parameterization where X counts the number of failures before the first success, so X = 0, 1, 2, \dots with probability mass function P(X = k) = p (1-p)^k, where p is the success probability ($0 < p \leq 1) and q = 1 - p. The entropy is H(X) = -\log_2 p - \frac{q \log_2 q}{p}, or equivalently H(X) = \frac{h_2(p)}{p}, where h_2(p) = -p \log_2 p - q \log_2 q is the binary entropy function.^[19] For the alternative parameterization where Y counts the number of trials until the first success, so Y = 1, 2, 3, \dots with P(Y = k) = p q^{k-1}, note that Y = X + 1. Since adding a constant does not change the entropy, H(Y) = H(X) = \frac{h_2(p)}{p}. The binary entropy h_2(p) arises from the memoryless property of the geometric distribution, reflecting the uncertainty in each Bernoulli trial scaled by the expected number of trials $1/p.^[19] The entropy H(X) (or H(Y)) is minimized at p = 1, where H = 0 bits, corresponding to certain success on the first trial with no uncertainty. As p \to 0^+, the entropy diverges to infinity, reflecting unbounded uncertainty due to the potentially infinite sequence of failures. For example, at p = 0.5, H \approx 2 bits. If natural logarithms are used instead, the entropy is measured in nats by replacing \log_2 with \ln.^[19]

Fisher Information

The Fisher information quantifies the amount of information that an observable random variable carries about an unknown parameter in a statistical model. For the geometric distribution, where X denotes the number of failures before the first success with success probability p \in (0,1), and probability mass function f(k;p) = p (1-p)^k for k = 0,1,2,\dots, the Fisher information I(p) is a scalar measuring the sensitivity of the distribution to changes in p.^[20] To derive I(p), consider the log-probability mass function: \log f(k;p) = \log p + k \log(1-p). The score function, or first derivative with respect to p, is

\frac{\partial}{\partial p} \log f(k;p) = \frac{1}{p} - \frac{k}{1-p}.

The second derivative is

\frac{\partial^2}{\partial p^2} \log f(k;p) = -\frac{1}{p^2} - \frac{k}{(1-p)^2}.

The Fisher information is then the negative expected value of this second derivative:

I(p) = -E\left[ \frac{\partial^2}{\partial p^2} \log f(k;p) \right] = E\left[ \frac{1}{p^2} + \frac{k}{(1-p)^2} \right] = \frac{1}{p^2} + \frac{E}{(1-p)^2}.

Since E = (1-p)/p for the geometric distribution, substitution yields

I(p) = \frac{1}{p^2} + \frac{(1-p)/p}{(1-p)^2} = \frac{1}{p^2} + \frac{1}{p(1-p)} = \frac{1}{p^2(1-p)}.

This expression holds equivalently when computed as the variance of the score function, confirming the result.^[21]^[20] For a sample of n independent observations, the total Fisher information is n I(p). This directly informs the asymptotic efficiency of estimators, such as the maximum likelihood estimator \hat{p}. Specifically, the asymptotic variance of \sqrt{n} (\hat{p} - p) is $1/I(p) = p^2 (1-p), implying that \hat{p} is asymptotically normal with mean p and variance p^2 (1-p)/n. Thus, estimation precision is highest near the boundaries p \to 0 or p \to 1, where the variance approaches zero, and lowest around p = 2/3, where I(p) achieves its minimum value of $27/4.^[20]

Bernoulli Distribution

The Bernoulli distribution serves as the foundational building block for the geometric distribution, representing the outcome of a single trial in a sequence of independent experiments. A Bernoulli trial is a random experiment with exactly two possible outcomes: success, occurring with probability p where $0 < p < 1, or failure, occurring with probability $1 - p.^[1] These trials are assumed to be independent, meaning the outcome of any one trial does not influence the others, and identically distributed, with the same success probability p for each. This setup forms a Bernoulli process, which underpins many discrete probability models. The geometric distribution emerges directly from a sequence of such independent and identically distributed (i.i.d.) Bernoulli trials as the distribution of the waiting time until the first success. Specifically, if trials continue until the initial success is observed, the number of trials required—denoted X—follows a geometric distribution with parameter p. This waiting time interpretation highlights the geometric as a natural extension of repeated Bernoulli experiments, where each preceding failure simply delays the eventual success without altering future probabilities.^[22] The independence of the Bernoulli trials ensures that the probability of success remains constant across attempts, making the geometric distribution memoryless in its progression.^[1] In a limiting case, the geometric distribution reduces to the Bernoulli distribution when the process is restricted to a single trial. Under the convention where the geometric random variable X counts the number of failures before the first success (so X = 0, 1, 2, \dots), the probability P(X = 0) = p exactly matches the Bernoulli probability of success on that solitary trial, with all higher values becoming impossible.^[23] This adjustment underscores the geometric's role as a generalization of the Bernoulli, where expanding to multiple potential trials accommodates waiting beyond the immediate outcome. All key properties of the geometric distribution, such as its memorylessness and moment-generating function, inherit directly from the independence and identical distribution of the underlying Bernoulli trials. Without this prerequisite structure, the geometric could not maintain its characteristic lack of dependence on prior failures, emphasizing the Bernoulli's essential preparatory role in deriving and understanding the geometric model.

Binomial Distribution

The binomial distribution and the geometric distribution are both derived from sequences of independent Bernoulli trials, each with success probability p and failure probability q = 1 - p, but they model different aspects of the process. The binomial distribution describes the number of successes in a fixed number n of trials, with probability mass function P(X = k) = \binom{n}{k} p^k q^{n-k} for k = 0, 1, \dots, n. In contrast, the geometric distribution describes the number of trials until the first success (equivalently, one fixed success with a variable number of trials), emphasizing the waiting time rather than a predetermined trial count.^[1] This distinction highlights how the binomial fixes the sample size while allowing variable outcomes, whereas the geometric fixes the outcome (first success) while allowing variable sample size.^[24] A key probabilistic relation connects the two: the probability mass function of the geometric distribution, P(X = k) = q^{k-1} p for k = 1, 2, \dots, can be expressed as the product of the success probability and the binomial probability of zero successes in the preceding k-1 trials, i.e., P(X = k) = p \cdot P(Y = 0), where Y \sim \text{Binomial}(k-1, p).^[23] Since P(Y = 0) = q^{k-1}, this directly yields the geometric form and illustrates how the waiting time to first success builds on the absence of successes, a core component of binomial probabilities.^[25] This link underscores the geometric as a foundational waiting-time model within the broader framework of Bernoulli trial accumulations. Conditioning on the binomial outcome provides another perspective on their interplay. Given exactly one success in n trials (i.e., S_n = 1 where S_n \sim \text{Binomial}(n, p)), the position X of that single success follows a discrete uniform distribution on \{1, 2, \dots, n\}, with P(X = k \mid S_n = 1) = 1/n for each k.^[23] This uniform conditioning contrasts with the unconditional geometric distribution, which decreases with k, and reflects an "inverse" waiting-time view: instead of waiting forward until success, the position is retrospectively uniform given the total constraint of one success.^[23] Notably, this conditional uniformity holds independently of p, as the events of failures before and after the success balance out.^[23] The probability generating functions further tie the distributions together, with the binomial's G(s) = (q + p s)^n reducing to the Bernoulli case (n=1) as G(s) = q + p s, the single-trial building block shared with the geometric.^[24] The geometric extends this via its generating function G(s) = \frac{p s}{1 - q s} for |s| < 1/q, which arises from summing the infinite series of potential waiting times and can be seen as generalizing the fixed-n structure to unbounded trials until success.^[24] This functional form facilitates derivations of moments and connections to other waiting-time models.^[1]

Negative Binomial Distribution

The negative binomial distribution arises as the distribution of the total number of failures occurring before the r-th success in a sequence of independent Bernoulli trials, each with success probability p. This can be viewed as the sum of r independent and identically distributed geometric random variables, where each geometric random variable counts the number of failures before a single success.^[26]^[27] If X ~ Geometric(p) represents the number of failures before the first success, with probability mass function (PMF) P(X = k) = (1-p)^k p for k = 0, 1, 2, ..., then the negative binomial random variable Y = X_1 + X_2 + \dots + X_r, where each X_i ~ Geometric(p), follows a negative binomial distribution NB(r, p).^[26]^[28] The PMF of the negative binomial distribution NB(r, p) for the number of failures y before the r-th success is given by

P(Y = y) = \binom{y + r - 1}{r - 1} p^r (1-p)^y, \quad y = 0, 1, 2, \dots

This form incorporates binomial coefficients to account for the number of ways to arrange y failures and r successes in a sequence ending with the r-th success.^[29] The expected value (mean) of Y is E[Y] = r (1-p)/p, and the variance is Var(Y) = r (1-p)/p^2. These moments generalize those of the geometric distribution, which correspond to the special case r = 1, where E[Y] = (1-p)/p and Var(Y) = (1-p)/p^2.^[29]^[27] The PMF of the negative binomial can be derived through convolution of the individual geometric PMFs, reflecting the additive nature of the failures across the r stages. Specifically, the probability P(Y = y) is the sum over all non-negative integers k_1, k_2, \dots, k_{r-1} such that k_1 + \dots + k_{r-1} + k_r = y of the product p^r (1-p)^{k_1 + \dots + k_r}, which simplifies to the binomial coefficient expression due to the identical distributions. Alternatively, this convolution result can be obtained using moment-generating functions: the MGF of a geometric random variable is p / (1 - (1-p) e^t) for t < -\ln(1-p), and raising it to the r-th power yields the MGF of the negative binomial, confirming the distribution.^[30]^[28]

Exponential Distribution

The exponential distribution arises as the continuous counterpart to the geometric distribution, modeling the waiting time until the first event in a continuous-time setting, much as the geometric distribution counts the number of discrete trials until the first success. In a Poisson process with rate parameter \lambda > 0, the interarrival time between events follows an exponential distribution with probability density function f(x) = \lambda e^{-\lambda x} for x \geq 0, paralleling the role of the geometric distribution in discrete-time Bernoulli processes.^[31]^[32] This analogy is precise through parameter correspondence: for a geometric distribution with success probability p, the associated exponential rate is \lambda = -\log(1 - p), ensuring the survival functions align exactly, P(X > k) = (1 - p)^k = e^{-\lambda k} for integer k. When p is small, \lambda \approx p, reflecting the low-probability regime where discrete and continuous models converge.^[31] Both distributions exhibit the memoryless property, where the conditional probability of waiting beyond an additional unit is independent of elapsed time or trials. In the continuous limit, the geometric probability mass function approximates the exponential density: consider time discretized into intervals of length \Delta t \to 0, with success probability p = \lambda \Delta t; then, for a waiting time scaled by \Delta t, the geometric PMF P(K = k) = (1 - p)^{k-1} p yields P(T \in [t, t + \Delta t)) \approx \lambda e^{-\lambda t} \Delta t, matching the exponential PDF.^[31]^[17] This limit interprets the geometric as a discretized exponential, with the waiting time for the first Poisson event in continuous time corresponding to the trial count in discrete time. A time-rescaled geometric random variable, where the number of trials K is mapped to time T = K \Delta t as \Delta t \to 0 and p = \lambda \Delta t, converges in distribution to an exponential random variable with rate \lambda. This transformation highlights the geometric as the discrete embedding of the exponential, unifying their interpretations in renewal theory.^[31]^[32]

Statistical Inference

Method of Moments

The method of moments provides a straightforward approach to estimate the success probability p in the geometric distribution by equating the first population moment to the corresponding sample moment. There are two common parameterizations of the geometric distribution: one counting the number of failures X before the first success, with probability mass function P(X = k) = p (1-p)^k for k = 0, 1, 2, \dots and population mean E[X] = (1-p)/p; the other counting the number of trials Y until the first success, with P(Y = k) = p (1-p)^{k-1} for k = 1, 2, \dots and E[Y] = 1/p./07:_Point_Estimation/7.02:_The_Method_of_Moments) For a random sample of size n, let \bar{X} and \bar{Y} denote the sample means from the failures and trials parameterizations, respectively. The method of moments estimator for p in the failures case is \hat{p} = 1 / (\bar{X} + 1), obtained by solving \bar{X} = (1 - \hat{p})/\hat{p}. In the trials case, the estimator simplifies to \hat{p} = 1 / \bar{Y}, by solving \bar{Y} = 1 / \hat{p}. These estimators arise directly from matching the unbiased sample mean to the population mean in each parameterization./07:_Point_Estimation/7.02:_The_Method_of_Moments)^[20] The trials version estimator is unbiased in the sense that the underlying sample mean \bar{Y} is unbiased for E[Y], though the nonlinear transformation to \hat{p} introduces finite-sample bias; regardless, both estimators are asymptotically consistent as n \to \infty, converging in probability to the true p. The asymptotic variance of \hat{p} is approximately p^2 (1-p)/n, derived from the Fisher information and applicable to large samples via the delta method; this can be estimated by substituting \hat{p} for p.^[20] While higher-order moments could be matched for added robustness in cases of model misspecification, the first moment is typically sufficient for the single-parameter geometric distribution, yielding a simple and efficient estimator./07:_Point_Estimation/7.02:_The_Method_of_Moments)

Maximum Likelihood Estimation

Consider a sample of n independent observations X_1, \dots, X_n from the geometric distribution, where X_i represents the number of failures before the first success, with probability mass function P(X_i = x_i) = p (1-p)^{x_i} for x_i = 0, 1, 2, \dots and parameter $0 < p < 1.^[33] The likelihood function is then

L(p) = \prod_{i=1}^n p (1-p)^{x_i} = p^n (1-p)^{\sum_{i=1}^n x_i}.

^[20] To find the maximum likelihood estimator (MLE), take the natural logarithm:

\log L(p) = n \log p + \left( \sum_{i=1}^n x_i \right) \log(1-p).

Differentiating with respect to p and setting the result to zero yields

\frac{\partial \log L(p)}{\partial p} = \frac{n}{p} - \frac{\sum_{i=1}^n x_i}{1-p} = 0,

which simplifies to \hat{p}_{\text{MLE}} = \frac{n}{n + \sum_{i=1}^n x_i} = \frac{1}{\bar{x} + 1}, where \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i.^[20] The second derivative \frac{\partial^2 \log L(p)}{\partial p^2} = -\frac{n}{p^2} - \frac{\sum_{i=1}^n x_i}{(1-p)^2} < 0 confirms this is a maximum.^[33] Notably, this MLE coincides with the method of moments estimator for the geometric distribution.^[7] Under standard regularity conditions, the MLE \hat{p}_{\text{MLE}} is asymptotically normal as n \to \infty:

\sqrt{n} (\hat{p}_{\text{MLE}} - p) \xrightarrow{d} N\left(0, \frac{p^2 (1-p)}{1}\right),

with asymptotic variance \frac{p^2 (1-p)}{n} derived from the inverse of the Fisher information I(p) = \frac{1}{p^2 (1-p)}.^[20] The Fisher information for a single observation is I(p) = -\mathbb{E}\left[ \frac{\partial^2 \log f(X;p)}{\partial p^2} \right] = \frac{1}{p^2 (1-p)}, where f(x;p) is the probability mass function.^[34] For the full sample, it scales to n I(p).^[20] The form of the MLE is invariant under the alternative parameterization of the geometric distribution as the number of trials until the first success, Y_i = X_i + 1, with P(Y_i = y_i) = p (1-p)^{y_i - 1} for y_i = 1, 2, \dots. In this case, the MLE is \hat{p}_{\text{MLE}} = \frac{n}{\sum_{i=1}^n y_i}, which equals \frac{1}{\bar{y}} and matches the failures-based estimator since \sum y_i = \sum x_i + n.^[33] The asymptotic properties remain the same under this reparameterization.^[20]

Bayesian Inference

In Bayesian inference for the geometric distribution, the success probability p is assigned a Beta prior distribution with shape parameters \alpha > 0 and \beta > 0, which is the conjugate prior due to its compatibility with the likelihood form. For n independent observations y_1, \dots, y_n from Geometric(p), where P(Y = y) = p (1-p)^{y-1} for y = 1, 2, \dots, the likelihood is proportional to p^n (1-p)^{\sum (y_i - 1)}. The resulting posterior distribution is Beta(\alpha + n, \beta + \sum (y_i - 1)), providing a closed-form update that incorporates both prior beliefs and observed data. The posterior mean serves as a Bayesian point estimate for p:

\hat{p} = \frac{\alpha + n}{\alpha + \beta + \sum y_i},

which shrinks the maximum likelihood estimate toward the prior mean \alpha / (\alpha + \beta) and converges to it as n increases. Credible intervals for p are obtained from the quantiles of this Beta posterior, offering probabilistic bounds that account for parameter uncertainty. The maximum a posteriori (MAP) estimate, corresponding to the posterior mode, is

\hat{p}_{\text{MAP}} = \frac{\alpha + n - 1}{\alpha + \beta + \sum y_i - 2}

when \alpha + n > 1 and \beta + \sum (y_i - 1) > 1, providing a peaked summary of the posterior under squared-error loss alternatives. For non-informative priors, the Jeffreys prior \pi(p) \propto 1 / [p \sqrt{1-p}], derived from the square root of the Fisher information I(p) = 1 / [p^2 (1-p)], is improper but yields a proper posterior for observed data.^[35] This prior leads to a posterior that approximates the maximum likelihood estimate in large samples, while incorporating parameter invariance properties. Bayesian approaches with conjugate priors like Beta handle small samples more effectively than maximum likelihood estimation by leveraging prior information to stabilize estimates and quantify uncertainty.^[36]

Computation

Random Variate Generation

Generating random variates from the geometric distribution can be accomplished using several methods, with the choice depending on computational efficiency and the specific parameterization (number of trials until first success or number of failures before first success). The direct simulation method involves repeatedly generating independent Bernoulli random variables with success probability p until the first success occurs. For the number of trials Y (support \{1, 2, \dots\}), this counts the trials including the success; for the number of failures X (support \{0, 1, 2, \dots\}), it counts only the preceding failures. This approach is straightforward but inefficient for small p, as it requires an expected $1/p uniform random variates per sample, leading to high computational cost when the expected value is large.^[37] A more efficient alternative is the inverse transform sampling method, which leverages the closed-form inverse of the cumulative distribution function (CDF). Generate U \sim \text{Uniform}(0,1). For the number of trials Y, compute

Y = \left\lceil \frac{\log U}{\log (1-p)} \right\rceil;

for the number of failures X,

X = \left\lfloor \frac{\log U}{\log (1-p)} \right\rfloor.

This relies on the CDF inversion F^{-1}(u) = \frac{\log(1-u)}{\log(1-p)}, adjusted for discreteness with ceiling or floor functions; since $1-U \sim \text{Uniform}(0,1), the form using U is equivalent. The method requires only one uniform variate and logarithmic operations per sample, making it suitable for all p > 0.^[3]^[38] The inverse transform method is preferred in computational practice due to its efficiency, particularly for small p, and is implemented in many simulation libraries for generating geometric variates.^[39]

Numerical Methods

The probability mass function (PMF) and cumulative distribution function (CDF) of the geometric distribution, modeling the number of failures before the first success with success probability p ($0 < p \leq 1), are given by

P(X = k) = (1 - p)^k p, \quad k = 0, 1, 2, \dots

and

F(k) = P(X \leq k) = 1 - (1 - p)^{k+1}, \quad k = 0, 1, 2, \dots,

respectively.^[8] For large k, direct evaluation of (1 - p)^k risks numerical underflow in floating-point systems, as the value approaches zero exponentially. To mitigate this, computations are performed in log-space:

\log P(X = k) = k \log(1 - p) + \log p,

with the result exponentiated only if the unlogged probability is required; otherwise, log-probabilities are retained for stability in subsequent operations like summation or ratio calculations.^[40] The CDF can similarly be evaluated as \log F(k) = \log\left(1 - \exp((k+1) \log(1 - p))\right), using specialized functions like log1p-exp for accuracy when (1 - p)^{k+1} is close to 1.^[41] The quantile function, which inverts the CDF to find the smallest k such that F(k) \geq u for u \in (0, 1], is derived from solving $1 - (1 - p)^{k+1} = u, yielding the real-valued form

q(u) = \frac{\log(1 - u)}{\log(1 - p)} - 1.

For the discrete case, this is rounded to the appropriate integer via ceiling to ensure the CDF condition holds:

q(u) = \left\lceil \frac{\log(1 - u)}{\log(1 - p)} - 1 \right\rceil.

This logarithmic formulation avoids overflow and underflow issues inherent in iterative summation methods, providing efficient evaluation even for extreme quantiles.^[41] Tail probabilities, such as P(X > k) = (1 - p)^{k+1}, follow directly from the CDF as the survival function and are computed analogously in log-space as (k+1) \log(1 - p) to prevent underflow for large k. Recursive evaluation, where P(X > k) = (1 - p) \cdot P(X > k-1), can enhance stability by iteratively multiplying probabilities in log-space, avoiding repeated exponentiations. For very large means \mu = (1 - p)/p, where the distribution becomes heavy-tailed, a normal approximation may be applied to estimate central tail probabilities using mean \mu and variance \mu(1 + \mu), though this is less accurate for extreme tails due to skewness.^[41] In software implementations, such as Python's SciPy library, the geometric distribution is handled via scipy.stats.geom, which computes PMF, CDF, and quantile functions with built-in safeguards against overflow; for instance, when p < 10^{-17}, results may clip to the integer maximum due to dtype limits, and log-scale methods are implicitly used for stability. For scenarios requiring arbitrary precision to fully avoid overflow, libraries like SymPy or mpmath enable exact or high-precision evaluations of the formulas above.^[42]

Applications

Theoretical Uses

The geometric distribution plays a fundamental role in renewal theory, where it models the interarrival times in discrete renewal processes. In such processes, if the interarrivals are independent and identically distributed according to a geometric distribution with success probability p, the renewals occur at times governed by the cumulative sum of these interarrivals, leading to a Bernoulli renewal process. This setup simplifies the analysis of the renewal function and asymptotic behavior, as the memoryless property ensures that the process restarts identically after each renewal.^[43] In the theory of Markov chains, the geometric distribution describes the absorption times in simple absorbing chains with constant transition probabilities. For a two-state chain where one state is absorbing and the transition probability from the transient state to the absorbing state is \alpha, the time to absorption starting from the transient state follows a geometric distribution with parameter \alpha. This result extends to phase-type distributions in more complex chains but highlights the geometric as the canonical discrete waiting time distribution in constant-rate settings.^[44] The probability generating function (PGF) of the geometric distribution, given by G(s) = \frac{p s}{1 - (1-p) s} for the number of trials until the first success, facilitates solving linear recurrence relations arising in combinatorial probability problems. By transforming sequences defined by recurrences—such as those for waiting times in repeated trials—into algebraic equations via the PGF, one can derive closed-form solutions for expectations, variances, and higher moments without direct summation. This approach is particularly useful in analyzing branching processes and coupon collector problems, where geometric waiting times recur.^[45] Regarding limit theorems, while a single geometric random variable does not converge to a normal distribution, the sum of n independent geometric random variables (which follows a negative binomial distribution) satisfies the central limit theorem: properly normalized, it converges in distribution to a standard normal as n \to \infty, provided the variance is finite (which holds for $0 < p \leq 1). This underscores the geometric's contribution to asymptotic normality in sums of discrete waiting times.^[46] The geometric distribution serves as the discrete analog of the exponential distribution in continuous-time processes, sharing the memoryless property that enables similar theoretical derivations in stochastic modeling.^[8]

Practical Examples

In quality control processes, the geometric distribution models the number of items inspected until the first defective one is encountered, where the success probability p represents the defect rate. For instance, in manufacturing widgets, if each item has a 2% chance of being defective, the expected number of inspections until a defect is found is $1/p = 50, aiding inspectors in estimating testing efficiency.^[47] Similarly, at fruit processing facilities, barrels of apples with a 4% spoilage rate follow a geometric distribution for the trials until the first spoiled barrel, informing quality assurance protocols. In queueing theory for discrete-time systems, the geometric distribution describes the number of time slots until the first service completion in models like the Geo/Geo/1 queue, where interarrival and service times are geometrically distributed. This applies to digital communication networks or call centers discretized into slots, where the probability of service success per slot determines wait times for the initial customer served.^[48] In biology and ecology, the geometric distribution arises in the Yule process, a pure birth model tracking lineages until a speciation or mutation event occurs. For example, in phylogenetics, it models the number of generations in a lineage until a new species emerges, with constant birth rate \lambda, helping reconstruct evolutionary trees from genetic data.^[49] This process underlies species diversification patterns observed in fossil records and modern biodiversity studies.^[50] In sports and gambling, the geometric distribution captures the trials until the first success in repeated independent events, such as coin flips until heads or die rolls until a six. A gambler rolling a fair die expects $1/(1/6) = 6 rolls on average to get a six, which informs betting strategies and game design in casinos. In team sports like basketball, it models free-throw attempts until the first make, with p as the shooter's success rate, used in performance analytics.^[51] During the COVID-19 pandemic, the geometric distribution modeled discrete-time delays in contact tracing, such as the number of days until the first traced contact leads to an identified infection source. Studies fitting geometric distributions to tracing data from outbreaks showed mean delays of 2 days, improving models for intervention timing and reducing transmission chains.^[52] In one analysis of U.S. county-level data, geometric fits estimated that tracing identified 99% of secondary contacts, validating its use in epidemic control simulations.^[53] In software development, the geometric distribution underpins bug discovery models like Moranda's geometric reliability growth model, where the number of test runs until the next bug is found decreases geometrically as faults are fixed. With initial failure probability p_0, the expected tests between failures guide release planning in large-scale projects.^[54] For bug isolation via remote sampling, geometric triggering of program samples ensures representative failure captures, as implemented in tools for distributed systems debugging.^[55]

References

[1]
11.1 - Geometric Distributions | STAT 414 - STAT ONLINE
Note that there are (theoretically) an infinite number of geometric distributions. Any specific geometric distribution depends on the value of the parameter ...
[2]
[PDF] Section 3.3 – Geometric Distributions
The geometric distribution is the distribution produced by the random variable X defined to count the number of trials needed to obtain the first success. For ...
[3]
[PDF] Geometric distribution
The geometric distribution can be used to model the number of failures before the first success in repeated mutually independent Bernoulli trials, ...Missing: definition | Show results with:definition
[4]
[PDF] Lecture 8 : The Geometric Distribution - UMD MATH
The geometric distribution is a special case of negative binomial, it is the case r = 1. It is so important we give it special treatment. Motivating example. ...
[5]
[PDF] Important Discrete Distributions: - Poisson, Geometric, & Modified ...
Suppose that an airplane engine will fail, when in flight, with probability 1−p independently from engine to engine. Also, suppose that the airplane makes a ...
[6]
[PDF] Note 14
Its distribution has a special name: it is called the geometric distribution with parameter p (where p is the probability that the coin comes up Heads on each ...<|control11|><|separator|>
[7]
[PDF] Confidence interval estimation for a geometric distribution
2.4. Applications of geometric distribution. Many applications can be modeled by a geometric distribution; for example, runs of one species with respect to ...
[8]
Geometric distribution | Properties, proofs, exercises - StatLect
The shifted geometric distribution is the distribution of the total number of trials (all the failures + the first success).Intuition · Relation To The Bernoulli... · Relation To The Exponential...
[9]
4.4 Geometric Distribution - Introductory Statistics 2e | OpenStax
Dec 13, 2023 · Unlike the binomial distribution with a fixed number of trials, the geometric distribution may have an infinite number of failed trials before a ...
[10]
[PDF] Geometric and Modified Geometric Distributions – Class Project 2
Geometric distribution counts trials until first success. Modified geometric counts failures before first success, where Z = X + 1.
[11]
3. The Geometric Distribution - Random Services
The geometric distribution describes the trial number of the first success in Bernoulli trials, or the number of failures before the first success. It has a ...
[12]
https://online.stat.psu.edu/stat414/lesson/11/11.2
[13]
11.2 - Key Properties of a Geometric Random Variable | STAT 414
The sum of a geometric series is: g · 0 ; Then, taking the derivatives of both sides, the first derivative with respect to r must be: g ′ · 1 ; And, taking the ...
[14]
[PDF] Probability Lecture I (August, 2006) - UC Berkeley Statistics
4.1.1 Geometric Distribution. Example 12 Consider an experiment that ... FX(x) is the cdf of a distribution called the geometric distribution. D. 4.2 ...
[15]
[PDF] Note 19 - Washington
This gives us the formal expression of the memoryless property of geometric distributions: P[X > n+m | X > m] = P[X > n]. Intuitively, the condition that X ...
[16]
[PDF] Theorem The geometric distribution has the memoryless ...
Theorem The geometric distribution has the memoryless (forgetfulness) property. or, equivalently P(X ≥ s + t) = P(X ≥ s)P(X ≥ t). simplify((1 - op(CDF( ...
[17]
[PDF] An Introduction to Discrete Probability - Khoury College of Computer ...
The distribution of this random variable is the geometric distribution, f(n)=(1 p)n1 p, n. 1. To compute its expectation, let us use Proposition 6.8. We have.
[18]
https://mathworld.wolfram.com/GeometricDistribution.html
[19]
Geometric Distribution -- from Wolfram MathWorld
The geometric distribution is the only discrete memoryless random distribution. It is a discrete analog of the exponential distribution.<|control11|><|separator|>
[20]
Expressions for the Entropy of Basic Discrete Distributions
- **Formula**: Shannon entropy of the geometric distribution is given by:
[21]
[PDF] HOMEWORK 5 SOLUTIONS 1. The geometric model. The method of ...
Alternatively, we could obtain the variance using the Fisher information: √ n(pMLE − p) ⇒ N 0, 1 I(p) , Stats 200: Autumn 2016.
[22]
[PDF] 9 Asymptotic Approximations and Practical Asymptotic Tools
Let Xi be iid from a geometric distribution with pmf f(x|θ) = θ(1 − θ)x−1,x = 1,2,···. (a) Calculate the Fisher information function. (b) Hence, find the ...
[23]
AHSS Geometric distribution
The geometric distribution is used to describe how many trials it takes to observe a success. Let's first look at an example.
[24]
[PDF] Chapter 4: Generating Functions
Probability Generating Functions (PGFs) are tools for discrete random variables, useful for sums and limits, and characterize the distribution of X+Y.Missing: excess kurtosis
[25]
https://www.randomservices.org/random/bernoulli/Binomial.html
[26]
11.4 - Negative Binomial Distributions | STAT 414 - STAT ONLINE
Any specific negative binomial distribution depends on the value of the parameter p . A geometric distribution is a special case of a negative binomial ...
[27]
Probability Playground: The Negative Binomial Distribution
Note that a negative binomial(r, p) random variable is the sum of r independent geometric(p) random variables, with mean 1/p and variance (1 − p)/p². By the ...
[28]
[PDF] 4 Moment generating functions - Arizona Math
So the sum of n independent geometric random variables with the same p gives the negative binomial with parameters p and n. for all nonzero t.<|control11|><|separator|>
[29]
11.5 - Key Properties of a Negative Binomial Random Variable
Just as we did for a geometric random variable, on this page, we present and verify four properties of a negative binomial random variable.
[30]
[PDF] Chapter 3. Discrete random variables and probability distributions.
Defn: The probability mass function of a binomial random variables is called the binomial distribution. Page 30. Formula for the binomial distribution. If we ...
[31]
The Exponential Distribution - Random Services
... Poisson process. Relation to the Geometric Distribution. In many respects, the geometric distribution is a discrete version of the exponential distribution.
[32]
[PDF] STAT-36700 Homework 4 - Solutions
Sep 28, 2018 · (a) Write down the likelihood as a function of the observed data X1,..., Xn, and the unknown parameter θ. (b) Compute the MLE of θ. (c) Use the ...<|control11|><|separator|>
[33]
Fisher Information for Geometric Distribution - Math Stack Exchange
Jul 11, 2013 · By definition, the Fisher information F(θ) is equal to the expectation ; where θ is a parameter to estimate and ; denoting by p(x,θ) the ...Find the Fisher information of geometric distributionIntuitive explanation of a definition of the Fisher informationMore results from math.stackexchange.com
[34]
[PDF] Minimum message length inference of the Poisson and geometric ...
Feb 11, 2018 · MML87 and the Jeffreys prior Suppose we use the Jeffreys prior dis- ... geometric distribution. The prior densities for λ and p are. πP (λ) ...
[35]
Analyzing small data sets using Bayesian estimation - NIH
Mar 11, 2015 · As Lee and Song (2004) showed, Bayesian estimation requires a much smaller ratio of parameters to observations, namely for the models used in ...<|separator|>
[36]
Random Variate Generation - OMSCS Notes
Inverse Transform Method ... The inverse transform method states that, if X X X is a continuous random variable with cdf F ( x ) F(x) F(x), then F ( X ) ∼ U ( 0 , ...Inverse Transform Method · Generating Standard Normal... · Geometric Example
[37]
A.2 Generating Random Variates from Distributions
If the cumulative distribution function can be inverted, then the inverse transform method can be easily used to generate random variates from the distribution.
[38]
[PDF] Random Variate Generation - Georgia Institute of Technology
Oct 26, 2020 · So here is the inverse transform method for generating a RV X having c.d.f. F(x): 1 Sample U from U(0,1).
[39]
tfp.distributions.Geometric - Probability - TensorFlow
Nov 21, 2023 · The Geometric distribution is parameterized by p, the probability of a positive event. It represents the probability that in k + 1 Bernoulli trials, the first ...
[40]
Geometric Distribution - Boost
Implementation ; skewness. (2-p)/√q ; kurtosis. 9+p²/q ; kurtosis excess. 6 +p²/q ; parameter estimation member functions. See Negative Binomial Distribution.
[41]
scipy.stats.geom — SciPy v1.16.2 Manual
A geometric discrete random variable. As an instance of the rv_discrete class, geom object inherits from it a collection of generic methods.1.13.0 · 1.12.0 · 1.11.4 · 1.15.2Missing: numerical overflow
[42]
[PDF] Applications of Renewal Theory to Pattern Analysis
Apr 18, 2016 · A renewal process has geometric interarrival times. ... The renewal function is unique to its specific renewal process' interarrival distribution.
[43]
[PDF] Lecture 4: Discrete-Time Markov Chain – Part 2 4.1 Limiting behavior
Thus, the time to absorption follows a Geometric(α) distribution. 4.2 ... A Markov chain with a stationary distribution satisfying the detailed balance is called ...
[44]
[PDF] Probability Generating Functions
Example: If X has PGF ps. (1 − qs) with q = 1 − p, then we can conclude that. X ∼ Geometric(p). Given a function A(s) which is known to be a PGF of a count r.v. ...
[45]
[PDF] ST370 Lecture 5 - NC State
Another Example – Geometric. • Using the geometric distribution, let's say we know the proportion of hydraulic cylinders that fail inspection is 0.002. E X ...
[46]
[PDF] Overview of Topics
• Continuous-time queues: the M/M/1 queue. • Little's Law. • Poisson Arrivals ... • Mainly concerned with the Geom/Geom/1, the M/M/1 and the M/D/1 queues.
[47]
Predicting Long Pendant Edges in Model Phylogenies, with ... - NIH
The Yule Process. We first consider a simple pure-birth model, referred to as the Yule process. This process begins with a single species at time Inline ...
[48]
Birth and death processes in phylogenetics and population genetics
Feb 20, 2025 · We consider a Yule process starting from a single species in a genus, and suppose that new species arise at constant rate λ , independently from ...
[49]
Geometric Distributions | CK-12 Foundation
The geometric distribution focuses on the number of trials before the first success occurs. Suppose we play a game where we flip a coin until someone flips a ...
[50]
Using statistics and mathematical modelling to understand infectious ...
... contact tracing delay follows a Geometric distribution with a mean of 2 days. ... epidemic models featuring contact tracing with delays. Mathematical ...
[51]
Novel Use of Capture-Recapture Methods to Estimate ... - CDC
Nov 1, 2021 · Geometric distributions were the best-fitting models. Our results indicate that contact tracing efforts identified almost all (n = 792, 99%) of ...
[52]
[PDF] A birth-process approach to Moranda's geometric software-reliability ...
Therefore in this model the failure rates decrease geometrically with the detection and fixing of a fault. Reference. [14] also derives the maximum likelihood ...
[53]
[PDF] Bug Isolation via Remote Program Sampling - Stanford CS Theory
We trigger samples based on a geometric distribution, which correctly models the ... Tracking down software bugs using automatic anomaly detection. In.
[54]
5 Real-Life Examples of the Geometric Distribution - Statology
Jul 14, 2021 · This tutorial provides several real-life examples of the geometric distribution, one of the most commonly used distributions in statistics.