Boole's inequality

Boole's inequality, also known as the union bound, is a fundamental result in probability theory that provides an upper bound on the probability that at least one event occurs in a countable collection of events.^[1] Formally, for any probability space and events A_1, A_2, \dots, it states that P\left( \bigcup_{i=1}^\infty A_i \right) \leq \sum_{i=1}^\infty P(A_i).^[2] Named after the mathematician George Boole, the inequality originates from his seminal 1854 work An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities, where he developed conditions of possible experience involving probabilistic inequalities for relative frequencies of events.^[3] In modern terms, it follows directly from the subadditivity of probability measures, as the union of events can be expressed as a disjoint union after removing overlaps, leading to P\left( \bigcup A_i \right) = \sum P(A_i^*) \leq \sum P(A_i) where the A_i^* are disjoint subsets.^[1] The result holds for finite collections via induction—starting from the base case P(A_1) \leq P(A_1) and extending by adding P(A_{n+1}) while subtracting non-negative intersection terms—and extends to countable cases by monotone convergence.^[2] Boole's inequality is a cornerstone for more advanced probabilistic tools, serving as the first-order case of the Bonferroni inequalities, which alternate between upper and lower bounds using inclusion-exclusion truncations.^[1] Applying it to complementary events yields bounds like P\left( \bigcap_{i=1}^n A_i^c \right) \leq \sum_{i=1}^n P(A_i^c), or equivalently P\left( \bigcap_{i=1}^n A_i \right) \geq 1 - \sum_{i=1}^n P(A_i^c).^[2] Its simplicity makes it indispensable in applications, such as estimating the probability of collisions in hashing (e.g., the birthday problem, where the union bound gives an upper bound of approximately 0.69 on the probability of at least one shared birthday among 23 people), analyzing randomized algorithms for tail bounds on error rates, and approximating empty bins in balls-and-bins models via Markov's inequality combined with the union bound.^[1] Despite its looseness when events overlap significantly, Boole's inequality provides a quick, non-asymptotic tool for proving existence results and concentration inequalities in fields like statistics, computer science, and reliability engineering.^[2]

Introduction

Statement of the Inequality

Boole's inequality, also known as the union bound, states that for any finite collection of events A_1, A_2, \dots, A_n in a probability space, the probability of their union satisfies

P\left( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n P(A_i).

^[1] This formulation provides an upper bound on the probability that at least one of the events occurs.^[1] Equality holds if and only if the events are pairwise disjoint, meaning A_i \cap A_j = \emptyset for all i \neq j.^[1]

Historical Context

George Boole introduced the inequality that bears his name, crediting Augustus De Morgan for the initial conception in a footnote, in his seminal 1854 work, An Investigation of the Laws of Thought, on Which Are Founded the Mathematical Theories of Logic and Probabilities, specifically in Chapter XIX titled "Of Statistical Conditions." Motivated by his broader project to formalize the laws governing human thought through symbolic logic, Boole sought to establish conditions under which numerical data in probabilistic problems could be deemed consistent, akin to those derived from actual statistical observations. This effort was part of his exploration of "perfect induction," where logical principles could quantify the likelihood of conclusions drawn from premises, bridging symbolic reasoning with probabilistic inference.^[3] In its original form, Boole expressed the inequality in terms of logical classes and their numerical extents, rather than the modern framework of probability events. He considered the "extent" of classes satisfying certain disjunctive conditions, deriving a bound on the measure of their union based on the extents of individual classes, without assuming probabilistic interpretations like independence. This logical framing emphasized constraints on possible values for class sizes to ensure interpretability, reflecting Boole's view of probability as an extension of logical necessity.^[3] The inequality's transition into mainstream probability theory occurred in the early 20th century, notably through its integration into proofs of limit theorems, such as the strong law of large numbers, where it provided essential bounds on the probability of unions of rare events. Mathematicians like A. A. Markov employed bounding techniques in studies of dependent trials, highlighting its utility in approximating union probabilities under limited information. By the mid-20th century, it had solidified as a core tool in stochastic processes, appearing prominently in foundational texts that analyzed random walks and convergence, underscoring its enduring role in rigorous probabilistic analysis.^[4]^[5]

Proofs

Proof Using Mathematical Induction

Boole's inequality states that for any finite collection of events A_1, A_2, \dots, A_n in a probability space, the probability of their union satisfies P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i).^[6] The proof by mathematical induction on n relies on the subadditivity property of probability measures, which asserts that for any two events X and Y, P(X \cup Y) \leq P(X) + P(Y).^[6] Base case: For n=1, the inequality reduces to P(A_1) \leq P(A_1), which holds with equality.^[7] Inductive hypothesis: Assume the inequality holds for n = k, that is, P\left(\bigcup_{i=1}^k A_i\right) \leq \sum_{i=1}^k P(A_i) for some positive integer k.^[6] Inductive step: Consider n = k+1. Then,

P\left(\bigcup_{i=1}^{k+1} A_i\right) = P\left( \left( \bigcup_{i=1}^k A_i \right) \cup A_{k+1} \right) \leq P\left( \bigcup_{i=1}^k A_i \right) + P(A_{k+1}),

by the subadditivity of probability. Applying the inductive hypothesis to the first term on the right yields

P\left( \bigcup_{i=1}^k A_i \right) + P(A_{k+1}) \leq \sum_{i=1}^k P(A_i) + P(A_{k+1}) = \sum_{i=1}^{k+1} P(A_i).

Thus, the inequality holds for n = k+1. By the principle of mathematical induction, it holds for all positive integers n.^[6]^[7] This inductive approach is particularly natural for establishing the bound on finite unions, as it recursively builds from the two-event subadditivity axiom, which is foundational in measure theory. However, the proof does not directly extend to infinite unions; for a countable collection \{A_i\}_{i=1}^\infty, the inequality P\left(\bigcup_{i=1}^\infty A_i\right) \leq \sum_{i=1}^\infty P(A_i) holds when the right-hand side is finite, but this requires taking the limit of the finite partial unions and invoking the continuity of probability measures from below.^[8] An alternative proof uses the linearity of expectation applied to indicator random variables for the events.^[6]

Direct Combinatorial Proof

A direct combinatorial proof of Boole's inequality utilizes indicator random variables to bound the probability of the union of events through the linearity of expectation. Consider a probability space (\Omega, \mathcal{F}, P) and events A_1, A_2, \dots, A_n \in \mathcal{F}. For each i = 1, \dots, n, define the indicator random variable I_{A_i}: \Omega \to \{0,1\} by I_{A_i}(\omega) = 1 if \omega \in A_i and I_{A_i}(\omega) = 0 otherwise.^[9] The event \bigcup_{i=1}^n A_i can be represented as the set \{\omega \in \Omega : \sum_{i=1}^n I_{A_i}(\omega) \geq 1\}. Let X = \sum_{i=1}^n I_{A_i}, which is a non-negative integer-valued random variable counting the number of events A_i that occur for a given \omega. The indicator for the union satisfies I_{\bigcup A_i}(\omega) \leq X(\omega) for all \omega \in \Omega, since the left side is 1 only if at least one I_{A_i}(\omega) = 1, while the right side is at least that value.^[8] Taking expectations yields P\left(\bigcup_{i=1}^n A_i\right) = E[I_{\bigcup A_i}] \leq E[X]. By linearity of expectation, E[X] = E\left[\sum_{i=1}^n I_{A_i}\right] = \sum_{i=1}^n E[I_{A_i}]. Moreover, E[I_{A_i}] = P(A_i) for each i, so E[X] = \sum_{i=1}^n P(A_i). Thus, P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i). This argument extends to countably infinite collections of events as well.^[9]^[8] This proof underscores Boole's inequality's foundations in measure theory, where it arises as a consequence of the σ-subadditivity of measures: for any measure μ and measurable sets A_i, \mu\left(\bigcup A_i\right) \leq \sum \mu(A_i). The indicator function approach naturally generalizes to arbitrary measures by replacing expectations with integrals, highlighting the inequality's broad applicability beyond probability spaces./02%3A_Probability_Spaces/2.03%3A_Probability_Measures)

Generalizations

Bonferroni Inequalities

The Bonferroni inequalities generalize Boole's inequality by providing a sequence of increasingly refined upper and lower bounds on the probability of the union of events, obtained by truncating the inclusion-exclusion expansion after a finite number of terms. These inequalities arise in probability theory as practical approximations when computing the exact inclusion-exclusion formula is infeasible due to the complexity of higher-order intersections. Named after Carlo Emilio Bonferroni, who formalized them in the 1930s, they extend the first-order union bound (Boole's inequality) to higher orders while preserving the alternating nature of the inclusion-exclusion series.^[1] Formally, for events A_1, \dots, A_n in a probability space and any positive integer k \leq n, define the partial sums

S_m = \sum_{1 \leq i_1 < \cdots < i_m \leq n} P\left( \bigcap_{j=1}^m A_{i_j} \right)

for m = 1, \dots, k, and the truncated inclusion-exclusion sum

T_k = \sum_{m=1}^k (-1)^{m+1} S_m.

The Bonferroni inequalities state that if k is odd, then P\left( \bigcup_{i=1}^n A_i \right) \leq T_k, providing an upper bound, while if k is even, then P\left( \bigcup_{i=1}^n A_i \right) \geq T_k, providing a lower bound. Boole's inequality corresponds to the case k=1, where T_1 = S_1 = \sum_{i=1}^n P(A_i) yields the upper bound. As k increases, the bounds tighten because the truncation error alternates in sign and generally decreases in magnitude for typical applications, with the full inclusion-exclusion principle recovered exactly when k = n.^[1]^[10] The alternating bounds reflect the structure of the inclusion-exclusion principle, where positive terms (odd m) overestimate the union and negative terms (even m) correct for overcounting. For instance, when k=2 (even), T_2 = \sum P(A_i) - \sum_{i<j} P(A_i \cap A_j) serves as a lower bound. Consider three events A, B, C: the second-order Bonferroni inequality gives

P(A \cup B \cup C) \geq P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C),

which improves upon the looser upper bound from Boole's inequality by subtracting pairwise overlaps, though it may still underestimate if triple intersections are significant. These inequalities are particularly valuable in scenarios where only low-order intersection probabilities are easily computable, offering bounds that converge to the exact probability as more terms are included.^[1]^[11]

Relation to Inclusion-Exclusion Principle

Boole's inequality is a special case of the more general Bonferroni inequalities, which are derived by truncating the inclusion-exclusion expansion for the probability of a union of events. The inclusion-exclusion principle expresses this probability exactly as

P\left( \bigcup_{i=1}^n A_i \right) = \sum_i P(A_i) - \sum_{i < j} P(A_i \cap A_j) + \sum_{i < j < k} P(A_i \cap A_j \cap A_k) - \cdots + (-1)^{n+1} P\left( \bigcap_{i=1}^n A_i \right),

where the sums are over intersections of 1, 2, 3, ..., n events, respectively.^[12] Truncating this alternating series after an odd number of terms k = 2m + 1 (ending with a positive term) yields an upper bound on the union probability, as the remainder is non-positive. Similarly, truncation after an even number of terms yields a lower bound, as the remainder is non-negative. This follows from the structure of the series and the positivity of probabilities, with proofs typically using mathematical induction on the number of events.^[10]^[13] The error magnitude satisfies |R| \leq S_{2m+2}, the magnitude of the first omitted term, due to the properties of the alternating series in this context. Higher-order truncations (larger m) yield tighter upper bounds, with Boole's inequality representing the coarsest case at m=0 (k=1), where the bound is simply \sum_i P(A_i) and the remainder is at most S_2 in magnitude but non-positive in sign. This truncation framework links Boole's inequality directly to the inclusion-exclusion principle, highlighting how partial expansions provide successively refined approximations.^[14]

Applications

Union Bounds in Probability

Boole's inequality, commonly referred to as the union bound in probability theory, provides a fundamental tool for upper-bounding the probability that at least one rare event occurs among a collection of possibly dependent events. For events A_1, \dots, A_n in a probability space, it states that P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i). This bound is especially useful in stochastic processes and concentration phenomena, where it helps control the likelihood of deviations, such as tails of sums of random variables, without needing full joint distribution information. By focusing on marginal probabilities, it simplifies analysis in settings with rare events, though it can be loose when events overlap significantly.^[9] The union bound derives naturally as a generalization of Markov's inequality applied to indicator variables, linking expectation-based tail bounds to union probabilities. Specifically, let I_i = \mathbf{1}_{A_i} be the indicator for A_i, so \sum I_i counts the number of occurring events. Then P\left(\bigcup A_i\right) = P\left(\sum I_i \geq 1\right) \leq \frac{E\left[\sum I_i\right]}{1} = \sum P(A_i) by Markov's inequality. This perspective extends Markov's one-variable bound to multiple events, enabling probabilistic upper bounds on the occurrence of any event in the collection and forming the basis for more advanced concentration results.^[15] In concentration inequalities for binomial distributions, the union bound acts as an initial step toward deriving exponential tail estimates like Chernoff bounds. For a binomial random variable X \sim \operatorname{Bin}(n, p), a crude union bound over indicator variables for individual successes provides a linear upper tail estimate, but Chernoff methods refine this by using moment-generating functions to achieve exponentially small probabilities, such as P(X \geq (1+\delta) np) \leq e^{-\frac{\delta^2 np}{3}} for \delta > 0. This progression highlights how Boole's inequality sets the stage for tighter controls on deviations in sums of independent Bernoulli trials, crucial for analyzing algorithms and sampling processes.^[16] A prominent application arises in random graph theory within the Erdős–Rényi model G(n, p), where the union bound estimates the probability of isolated vertices, a rare event in dense graphs. For each vertex v, let A_v be the event that v is isolated, so P(A_v) = (1-p)^{n-1}. By Boole's inequality, P(\exists \text{ isolated vertex}) = P\left(\bigcup_{v=1}^n A_v\right) \leq n (1-p)^{n-1}. When p > \frac{\ln n}{n}, this bound tends to 0, indicating that the graph is typically isolated-vertex-free with high probability, aiding in threshold analysis for connectivity properties.^[17] Boole's inequality is central to the Lovász Local Lemma (LLL), which extends union bounds to handle limited dependencies among bad events in probabilistic existence proofs. Introduced by Erdős and Lovász, the symmetric LLL considers bad events E_1, \dots, E_m where each P(E_i) \leq q and E_i depends on at most d others; if e q (d+1) \leq 1, then P(\bigcap \overline{E_i}) > 0. The proof employs the union bound iteratively on dependency neighborhoods, treating locally independent subsets to show positive probability of avoiding all bad events, even when global union bounds fail due to dependencies. This technique has broad impact in combinatorics and algorithm design for rare event avoidance.^[18]

Error Estimation in Algorithms

Boole's inequality serves as a fundamental tool for deriving worst-case guarantees in randomized algorithms, particularly by providing upper bounds on the probability of failure events that arise from unions of independent or dependent bad outcomes. In computational settings, it enables designers to quantify the risk of estimation errors or inconsistencies without requiring tight interdependence analysis, often leading to efficient sample complexity bounds that ensure algorithmic reliability with high probability. This application is especially prevalent in scenarios where multiple probabilistic components must collectively perform well, such as in approximation algorithms and machine learning procedures. In Monte Carlo simulations, Boole's inequality is employed to bound the probability that an estimator deviates significantly from its true value by considering the union of failure events across multiple samples or iterations. For instance, to estimate a parameter θ, the simulation might rely on a set of estimators ˆθ_n(X, t) for parameters t in a set T; the event that the minimum over t exceeds the true minimum by more than ε can be bounded as P(min_t ˆθ_n(X, t) > min_s θ(s) + ε) ≤ ∑_{t ∈ T} P(ˆθ_n(X, t) > θ(t) + ε), directly applying the union bound to cap the overall estimation error probability. This approach is particularly useful in imprecise probability frameworks, where it facilitates iterative importance sampling to refine lower previsions while controlling cumulative error risks. A key application arises in hashing and load balancing, where Boole's inequality bounds collision probabilities in universal hashing families to ensure balanced distribution of items across bins. In a universal hash family H mapping keys to m slots, for distinct keys x and y, the probability Pr_h[h(x) = h(y)] ≤ 1/m holds by definition; extending this to a set of n keys, the probability of any collision is Pr[∃ i < j, h(x_i) = h(x_j)] ≤ \binom{n}{2} / m via the union bound over all pairs, providing a guarantee that scales quadratically with n but linearly with 1/m. This bound underpins load balancing in randomized algorithms, such as cuckoo hashing or balls-and-bins models.^[19] The Flajolet-Martin algorithm for approximate counting of distinct elements exemplifies this in streaming data contexts, using Boole's inequality to cap failure probabilities across multiple independent estimators. The algorithm hashes elements to bitmaps and estimates cardinality via the position of the lowest unset bit, but variance is reduced by running k parallel estimators and taking their median or average; each estimator deviates by more than a constant factor φ (e.g., 2) with constant probability (≈0.32), but using the median, the overall failure probability can be made δ by setting k = O(log(1/δ)). This allows space-efficient estimation (O(log log n + log(1/δ))) with relative error ε and success probability 1 - δ, forming the basis for advanced sketches like HyperLogLog.^[20] In probably approximately correct (PAC) learning, Boole's inequality connects to bounding hypothesis errors over a finite concept class H, providing sample complexity guarantees for empirical risk minimization. For a finite hypothesis set |H|, the probability that any hypothesis h ∈ H has empirical error differing from its true error by more than ε is Pr[∃ h ∈ H, |err(h) - \hat{err}(h)| > ε] ≤ |H| \cdot 2 \exp(-2 m ε^2) by Hoeffding's inequality combined with the union bound, implying that m = O((1/ε^2) (log |H| + log(1/δ))) samples suffice to PAC-learn H with error ε and confidence 1 - δ. This establishes that all finite classes are PAC-learnable and highlights the role of class size in controlling generalization error.