Fact-checked by Grok 2 weeks ago

Boole's inequality

Boole's inequality, also known as the union bound, is a fundamental result in that provides an upper bound on the probability that at least one event occurs in a countable collection of . Formally, for any and A_1, A_2, \dots, it states that P\left( \bigcup_{i=1}^\infty A_i \right) \leq \sum_{i=1}^\infty P(A_i). Named after the mathematician , the inequality originates from his seminal 1854 work An Investigation of the on Which are Founded the Mathematical Theories of and Probabilities, where he developed conditions of possible experience involving probabilistic inequalities for relative frequencies of events. In modern terms, it follows directly from the subadditivity of probability measures, as the of events can be expressed as a after removing overlaps, leading to P\left( \bigcup A_i \right) = \sum P(A_i^*) \leq \sum P(A_i) where the A_i^* are disjoint subsets. The result holds for finite collections via —starting from the case P(A_1) \leq P(A_1) and extending by adding P(A_{n+1}) while subtracting non-negative intersection terms—and extends to countable cases by monotone convergence. Boole's inequality is a for more advanced probabilistic tools, serving as the first-order case of the Bonferroni inequalities, which alternate between using inclusion-exclusion truncations. Applying it to complementary events yields bounds like P\left( \bigcap_{i=1}^n A_i^c \right) \leq \sum_{i=1}^n P(A_i^c), or equivalently P\left( \bigcap_{i=1}^n A_i \right) \geq 1 - \sum_{i=1}^n P(A_i^c). Its simplicity makes it indispensable in applications, such as estimating the probability of collisions in hashing (e.g., the , where the union bound gives an upper bound of approximately 0.69 on the probability of at least one shared birthday among 23 people), analyzing randomized algorithms for tail bounds on error rates, and approximating empty bins in balls-and-bins models via combined with the union bound. Despite its looseness when events overlap significantly, Boole's inequality provides a quick, non-asymptotic tool for proving existence results and in fields like statistics, , and .

Introduction

Statement of the Inequality

Boole's inequality, also known as the union bound, states that for any finite collection of A_1, A_2, \dots, A_n in a , the probability of their satisfies P\left( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n P(A_i). This formulation provides an upper bound on the probability that at least one of the events occurs. Equality holds if and only if the events are pairwise disjoint, meaning A_i \cap A_j = \emptyset for all i \neq j.

Historical Context

introduced the inequality that bears his name, crediting for the initial conception in a footnote, in his seminal 1854 work, An Investigation of , on Which Are Founded the Mathematical Theories of Logic and Probabilities, specifically in Chapter XIX titled "Of Statistical Conditions." Motivated by his broader project to formalize the laws governing human thought through symbolic , Boole sought to establish conditions under which numerical in probabilistic problems could be deemed consistent, akin to those derived from actual statistical observations. This effort was part of his exploration of "perfect induction," where logical principles could quantify the likelihood of conclusions drawn from premises, bridging symbolic reasoning with probabilistic inference. In its original form, Boole expressed the inequality in terms of logical classes and their numerical extents, rather than the modern framework of probability events. He considered the "extent" of classes satisfying certain disjunctive conditions, deriving a bound on the measure of their based on the extents of individual classes, without assuming probabilistic interpretations like . This logical framing emphasized constraints on possible values for class sizes to ensure interpretability, reflecting Boole's view of probability as an extension of logical necessity. The inequality's transition into mainstream occurred in the early , notably through its integration into proofs of limit theorems, such as the , where it provided essential bounds on the probability of unions of rare events. Mathematicians like A. A. Markov employed bounding techniques in studies of dependent trials, highlighting its utility in approximating union probabilities under limited information. By the mid-, it had solidified as a core tool in processes, appearing prominently in foundational texts that analyzed random walks and , underscoring its enduring role in rigorous probabilistic .

Proofs

Proof Using Mathematical Induction

Boole's inequality states that for any finite collection of events A_1, A_2, \dots, A_n in a , the probability of their union satisfies P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i). The proof by on n relies on the property of probability measures, which asserts that for any two events X and Y, P(X \cup Y) \leq P(X) + P(Y). Base case: For n=1, the inequality reduces to P(A_1) \leq P(A_1), which holds with equality. Inductive hypothesis: Assume the inequality holds for n = k, that is, P\left(\bigcup_{i=1}^k A_i\right) \leq \sum_{i=1}^k P(A_i) for some positive integer k. Inductive step: Consider n = k+1. Then, P\left(\bigcup_{i=1}^{k+1} A_i\right) = P\left( \left( \bigcup_{i=1}^k A_i \right) \cup A_{k+1} \right) \leq P\left( \bigcup_{i=1}^k A_i \right) + P(A_{k+1}), by the subadditivity of probability. Applying the inductive hypothesis to the first term on the right yields P\left( \bigcup_{i=1}^k A_i \right) + P(A_{k+1}) \leq \sum_{i=1}^k P(A_i) + P(A_{k+1}) = \sum_{i=1}^{k+1} P(A_i). Thus, the inequality holds for n = k+1. By the principle of mathematical induction, it holds for all positive integers n. This inductive approach is particularly natural for establishing the bound on finite unions, as it recursively builds from the two-event subadditivity axiom, which is foundational in measure theory. However, the proof does not directly extend to infinite unions; for a countable collection \{A_i\}_{i=1}^\infty, the P\left(\bigcup_{i=1}^\infty A_i\right) \leq \sum_{i=1}^\infty P(A_i) holds when the right-hand side is finite, but this requires taking the limit of the finite partial unions and invoking the continuity of probability measures from below. An alternative proof uses the linearity of expectation applied to indicator random variables for the events.

Direct Combinatorial Proof

A direct combinatorial proof of Boole's inequality utilizes indicator random variables to bound the probability of the union of events through the linearity of expectation. Consider a probability space (\Omega, \mathcal{F}, P) and events A_1, A_2, \dots, A_n \in \mathcal{F}. For each i = 1, \dots, n, define the indicator random variable I_{A_i}: \Omega \to \{0,1\} by I_{A_i}(\omega) = 1 if \omega \in A_i and I_{A_i}(\omega) = 0 otherwise. The event \bigcup_{i=1}^n A_i can be represented as the set \{\omega \in \Omega : \sum_{i=1}^n I_{A_i}(\omega) \geq 1\}. Let X = \sum_{i=1}^n I_{A_i}, which is a non-negative integer-valued counting the number of events A_i that occur for a given \omega. The indicator for the union satisfies I_{\bigcup A_i}(\omega) \leq X(\omega) for all \omega \in \Omega, since the left side is 1 only if at least one I_{A_i}(\omega) = 1, while the right side is at least that value. Taking expectations yields P\left(\bigcup_{i=1}^n A_i\right) = E[I_{\bigcup A_i}] \leq E[X]. By of expectation, E[X] = E\left[\sum_{i=1}^n I_{A_i}\right] = \sum_{i=1}^n E[I_{A_i}]. Moreover, E[I_{A_i}] = P(A_i) for each i, so E[X] = \sum_{i=1}^n P(A_i). Thus, P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i). This argument extends to countably infinite collections of events as well. This proof underscores Boole's inequality's foundations in measure theory, where it arises as a consequence of the σ-subadditivity of measures: for any measure μ and measurable sets A_i, \mu\left(\bigcup A_i\right) \leq \sum \mu(A_i). The approach naturally generalizes to arbitrary measures by replacing expectations with integrals, highlighting the inequality's broad applicability beyond probability spaces./02%3A_Probability_Spaces/2.03%3A_Probability_Measures)

Generalizations

Bonferroni Inequalities

The Bonferroni inequalities generalize Boole's inequality by providing a sequence of increasingly refined on the probability of the union of events, obtained by truncating the inclusion-exclusion after a finite number of terms. These inequalities arise in as practical approximations when computing the exact inclusion-exclusion formula is infeasible due to the complexity of higher-order intersections. Named after Carlo Emilio Bonferroni, who formalized them in , they extend the union bound (Boole's inequality) to higher orders while preserving the alternating nature of the inclusion-exclusion series. Formally, for events A_1, \dots, A_n in a and any positive k \leq n, define the partial sums S_m = \sum_{1 \leq i_1 < \cdots < i_m \leq n} P\left( \bigcap_{j=1}^m A_{i_j} \right) for m = 1, \dots, k, and the truncated inclusion-exclusion sum T_k = \sum_{m=1}^k (-1)^{m+1} S_m. The Bonferroni inequalities state that if k is odd, then P\left( \bigcup_{i=1}^n A_i \right) \leq T_k, providing an upper bound, while if k is even, then P\left( \bigcup_{i=1}^n A_i \right) \geq T_k, providing a lower bound. Boole's inequality corresponds to the case k=1, where T_1 = S_1 = \sum_{i=1}^n P(A_i) yields the upper bound. As k increases, the bounds tighten because the truncation error alternates in sign and generally decreases in magnitude for typical applications, with the full inclusion-exclusion principle recovered exactly when k = n. The alternating bounds reflect the structure of the inclusion-exclusion principle, where positive terms (odd m) overestimate the union and negative terms (even m) correct for overcounting. For instance, when k=2 (even), T_2 = \sum P(A_i) - \sum_{i<j} P(A_i \cap A_j) serves as a lower bound. Consider three events A, B, C: the second-order Bonferroni inequality gives P(A \cup B \cup C) \geq P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C), which improves upon the looser upper bound from Boole's inequality by subtracting pairwise overlaps, though it may still underestimate if triple intersections are significant. These inequalities are particularly valuable in scenarios where only low-order intersection probabilities are easily computable, offering bounds that converge to the exact probability as more terms are included.

Relation to Inclusion-Exclusion Principle

Boole's inequality is a special case of the more general , which are derived by truncating the inclusion-exclusion expansion for the probability of a union of events. The expresses this probability exactly as P\left( \bigcup_{i=1}^n A_i \right) = \sum_i P(A_i) - \sum_{i < j} P(A_i \cap A_j) + \sum_{i < j < k} P(A_i \cap A_j \cap A_k) - \cdots + (-1)^{n+1} P\left( \bigcap_{i=1}^n A_i \right), where the sums are over intersections of 1, 2, 3, ..., n events, respectively. Truncating this alternating series after an odd number of terms k = 2m + 1 (ending with a positive term) yields an upper bound on the union probability, as the remainder is non-positive. Similarly, truncation after an even number of terms yields a lower bound, as the remainder is non-negative. This follows from the structure of the series and the positivity of probabilities, with proofs typically using mathematical induction on the number of events. The error magnitude satisfies |R| \leq S_{2m+2}, the magnitude of the first omitted term, due to the properties of the alternating series in this context. Higher-order truncations (larger m) yield tighter upper bounds, with Boole's inequality representing the coarsest case at m=0 (k=1), where the bound is simply \sum_i P(A_i) and the remainder is at most S_2 in magnitude but non-positive in sign. This truncation framework links Boole's inequality directly to the , highlighting how partial expansions provide successively refined approximations.

Applications

Union Bounds in Probability

Boole's inequality, commonly referred to as the union bound in probability theory, provides a fundamental tool for upper-bounding the probability that at least one rare event occurs among a collection of possibly dependent events. For events A_1, \dots, A_n in a probability space, it states that P\left(\bigcup_{i=1}^n A_i\right) \leq \sum_{i=1}^n P(A_i). This bound is especially useful in stochastic processes and concentration phenomena, where it helps control the likelihood of deviations, such as tails of sums of random variables, without needing full joint distribution information. By focusing on marginal probabilities, it simplifies analysis in settings with rare events, though it can be loose when events overlap significantly. The union bound derives naturally as a generalization of Markov's inequality applied to indicator variables, linking expectation-based tail bounds to union probabilities. Specifically, let I_i = \mathbf{1}_{A_i} be the indicator for A_i, so \sum I_i counts the number of occurring events. Then P\left(\bigcup A_i\right) = P\left(\sum I_i \geq 1\right) \leq \frac{E\left[\sum I_i\right]}{1} = \sum P(A_i) by Markov's inequality. This perspective extends Markov's one-variable bound to multiple events, enabling probabilistic upper bounds on the occurrence of any event in the collection and forming the basis for more advanced concentration results. In concentration inequalities for binomial distributions, the union bound acts as an initial step toward deriving exponential tail estimates like Chernoff bounds. For a binomial random variable X \sim \operatorname{Bin}(n, p), a crude union bound over indicator variables for individual successes provides a linear upper tail estimate, but Chernoff methods refine this by using moment-generating functions to achieve exponentially small probabilities, such as P(X \geq (1+\delta) np) \leq e^{-\frac{\delta^2 np}{3}} for \delta > 0. This progression highlights how Boole's inequality sets the stage for tighter controls on deviations in sums of independent trials, crucial for analyzing algorithms and sampling processes. A prominent application arises in random graph theory within the Erdős–Rényi model G(n, p), where the union bound estimates the probability of isolated vertices, a rare event in dense graphs. For each vertex v, let A_v be the event that v is isolated, so P(A_v) = (1-p)^{n-1}. By Boole's inequality, P(\exists \text{ isolated vertex}) = P\left(\bigcup_{v=1}^n A_v\right) \leq n (1-p)^{n-1}. When p > \frac{\ln n}{n}, this bound tends to 0, indicating that the graph is typically isolated-vertex-free with high probability, aiding in threshold analysis for connectivity properties. Boole's inequality is central to the (LLL), which extends union bounds to handle limited dependencies among bad in probabilistic existence proofs. Introduced by Erdős and Lovász, the symmetric LLL considers bad E_1, \dots, E_m where each P(E_i) \leq q and E_i depends on at most d others; if e q (d+1) \leq 1, then P(\bigcap \overline{E_i}) > 0. The proof employs the bound iteratively on dependency neighborhoods, treating locally independent subsets to show positive probability of avoiding all bad , even when global union bounds fail due to dependencies. This technique has broad impact in and design for rare event avoidance.

Error Estimation in Algorithms

Boole's inequality serves as a fundamental tool for deriving worst-case guarantees in randomized algorithms, particularly by providing upper bounds on the probability of failure events that arise from unions of independent or dependent bad outcomes. In computational settings, it enables designers to quantify the risk of estimation errors or inconsistencies without requiring tight interdependence analysis, often leading to efficient bounds that ensure algorithmic reliability with high probability. This application is especially prevalent in scenarios where multiple probabilistic components must collectively perform well, such as in approximation algorithms and procedures. In Monte Carlo simulations, Boole's inequality is employed to bound the probability that an estimator deviates significantly from its true value by considering the union of failure events across multiple samples or iterations. For instance, to estimate a parameter θ, the simulation might rely on a set of estimators ˆθ_n(X, t) for parameters t in a set T; the event that the minimum over t exceeds the true minimum by more than ε can be bounded as P(min_t ˆθ_n(X, t) > min_s θ(s) + ε) ≤ ∑_{t ∈ T} P(ˆθ_n(X, t) > θ(t) + ε), directly applying the union bound to cap the overall estimation error probability. This approach is particularly useful in imprecise probability frameworks, where it facilitates iterative importance sampling to refine lower previsions while controlling cumulative error risks. A key application arises in hashing and load balancing, where Boole's inequality bounds collision probabilities in families to ensure balanced distribution of items across bins. In a universal hash family H mapping keys to m slots, for distinct keys x and y, the probability Pr_h[h(x) = h(y)] ≤ 1/m holds by definition; extending this to a set of n keys, the probability of any collision is Pr[∃ i < j, h(x_i) = h(x_j)] ≤ \binom{n}{2} / m via the union bound over all pairs, providing a guarantee that scales quadratically with n but linearly with 1/m. This bound underpins load balancing in randomized algorithms, such as cuckoo hashing or balls-and-bins models. The Flajolet-Martin algorithm for approximate counting of distinct elements exemplifies this in streaming data contexts, using to cap failure probabilities across multiple independent estimators. The algorithm hashes elements to bitmaps and estimates cardinality via the position of the lowest unset bit, but variance is reduced by running k parallel estimators and taking their median or average; each estimator deviates by more than a constant factor φ (e.g., 2) with constant probability (≈0.32), but using the median, the overall failure probability can be made δ by setting k = O(log(1/δ)). This allows space-efficient estimation (O(log log n + log(1/δ))) with relative error ε and success probability 1 - δ, forming the basis for advanced sketches like HyperLogLog. In probably approximately correct (PAC) learning, Boole's inequality connects to bounding hypothesis errors over a finite concept class H, providing sample complexity guarantees for empirical risk minimization. For a finite hypothesis set |H|, the probability that any hypothesis h ∈ H has empirical error differing from its true error by more than ε is Pr[∃ h ∈ H, |err(h) - \hat{err}(h)| > ε] ≤ |H| \cdot 2 \exp(-2 m ε^2) by Hoeffding's inequality combined with the union bound, implying that m = O((1/ε^2) (log |H| + log(1/δ))) samples suffice to PAC-learn H with error ε and confidence 1 - δ. This establishes that all finite classes are PAC-learnable and highlights the role of class size in controlling generalization error.

References

  1. [1]
    [PDF] Probability inequalities
    Boole's inequality(or the union bound) states that for any at most countable collection of events, the probability that at least one of the events happens ...
  2. [2]
    [PDF] Lecture 2 : Basics of Probability Theory
    There is a similarity between Boole's Inequality and Bonferroni's Inequality. If we apply Boole's. Inequality to Ac, we have. P(∪n i=1Ac i ) ≤ n. X i=1. P(Ac.
  3. [3]
    [PDF] Project Gutenberg's An Investigation of the Laws of Thought, by ...
    Project Gutenberg's An Investigation of the Laws of Thought, by George Boole ... the constants in the data will be given by the inequality,. Inf. lim. Prob ...
  4. [4]
    [PDF] Boole's probability bounding problem, linear programming ... - arXiv
    Jan 24, 2025 · In this paper we study a class of linear programming aggregations, motivated by ... Laws of thought. American reprint of 1854 edition, Dover, 1854 ...
  5. [5]
    XII. On the theory of probabilities - Journals
    ... An Investigation of the Laws of Thought” (London, Walton and Maberly, 1854). The application of this method to particular problems has been illustrated in ...
  6. [6]
    On the history of the Strong Law of Large Numbers and Boole's ...
    Boole, 1854. G Boole. An investigation of the laws of thought on which are founded the mathematical theories of logic and probabilities. Macmillan, London ...
  7. [7]
    [PDF] Reversible Markov Chains and Random Walks on Graphs
    ... inequality . . . . . . . . . . . . . . . . . 388. 12.1.2 Comments on coupling ... Boole's inequality, for an n-state chain. Pi(C > ket. ∗. ) ≤ ne. −k. , k ...
  8. [8]
    [PDF] MATHEMATICS 191, FALL 2004 MATHEMATICAL PROBABILITY ...
    Prove that for a probability measure P, P(A∪B) <= P(A)+P(B). Then by induction on this result, prove “Boole's inequality”. P( n. J i=1. Ai) ≤ n. ∑ i=1. P(Ai) ...
  9. [9]
    [PDF] MATH 149A discussion
    This is another proof of Boole's inequality, one that is done using a proof technique called proof by induction. For your quiz on. October 22, you may use ...
  10. [10]
    [PDF] Principles of Uncertainty - Statistics & Data Science
    ... mathematical induction and is a nice way of proving results for all finite ... Boole's Inequality. The proof of Boole's Inequality uses (1.19). This ...
  11. [11]
    The Union Bound and Extension - Probability Course
    The union bound or Boole's inequality [13] is applicable when you need to show that the probability of union of some events is less than some value.
  12. [12]
    Chapter 8. Bonferroni's inequalities - IISc Math
    Inclusion-exclusion ... The following inequalities generalize the union bound, and gives both upper and lower bounds for the probability of the union of a bunch ...
  13. [13]
    [PDF] S&DS 241 Lecture 7 Union bound. Inclusion-Exclusion principles. B-H
    1 Inequality: union bound (Boole's or Bonferroni's inequality). 2 Equality ... A student takes 4 classes; each fails with probability 3%. Consider. P ...
  14. [14]
    Inclusion-Exclusion Principle -- from Wolfram MathWorld
    also holds, and is known as Boole's inequality or one of the Bonferroni inequalities. This formula can be generalized in the following beautiful manner.
  15. [15]
    [PDF] INCLUSION-EXCLUSION PRINCIPLE 1. Notations • A1, A2 - CSUN
    Abstract. We prove a generalized version of inclusion-exclusion principle including Bonferroni inequalities. 1. Notations. • A1, A2, ·· ...
  16. [16]
    Markov and Chebyshev Inequalities - Probability Course
    Prove the union bound using Markov's inequality. Solution. Similar to the discussion in the previous section, let A1,A2,...,An be any events and X be the ...
  17. [17]
    [2506.15612] A survey of Chernoff and Hoeffding bounds - arXiv
    Jun 18, 2025 · Abstract:This is a survey paper that discusses the original bounds of the seminal papers by Chernoff and Hoeffding.
  18. [18]
    [PDF] Random graphs - University of Bristol
    This random graph model was introduced by Erd˝os and Rényi in 1959, and has been studied extensively since then. A great deal is known about the properties of ...
  19. [19]
    [PDF] 6 Lovász Local Lemma - MIT OpenCourseWare
    The Lovász local lemma (LLL) was introduced in the paper of Erdős and Lovász · (1975). It is a powerful tool in the probabilistic method.