Birthday problem
The birthday problem, also known as the birthday paradox, is a probability puzzle that examines the likelihood that at least two people in a group share the same birthday, assuming birthdays are uniformly distributed across 365 days of the year and ignoring leap years and year differences.[1][2] The counterintuitive core result is that, in a group of just 23 people, the probability of at least one shared birthday exceeds 50%, far lower than the intuitive expectation of needing roughly half the number of days (around 183) for such an outcome.[1][2] The probability is calculated as 1 minus the likelihood that all birthdays are distinct, given by the formula $1 - \frac{365! / (365 - n)!}{365^n} for n people, where the numerator represents the number of ways to assign unique birthdays and the denominator the total possible assignments.[1][2] This yields approximately 50.7% for n = 23 and approaches certainty (over 99%) for groups larger than 60.[2] An approximation for the threshold where the probability reaches 50% is n \approx \sqrt{2 \times 365 \times \ln 2}, highlighting the quadratic growth in pairwise comparisons (about n(n-1)/2) that drives the result.[2] The problem's origins trace to Richard von Mises, who introduced it in a 1939 lecture, though it gained prominence in the 1950s through works like William Feller's probability textbook.[3][4][5] Beyond its pedagogical value in illustrating probabilistic intuition and the pigeonhole principle, the birthday problem has significant applications in fields like cryptography, where it underpins "birthday attacks" on hash functions—exploiting collisions in output spaces of size m with roughly \sqrt{m} trials rather than m/2, as seen in analyses of functions like MD5.[6][7] Generalizations extend to non-uniform birthday distributions, multiple matches, or larger calendars, and it informs algorithms in computer science for collision detection and randomized data structures.[3][2]Core Problem
Problem Statement
The birthday problem addresses the probability that at least two individuals in a randomly selected group of n people share the same birthday. This classic question in probability theory assumes a non-leap year with exactly 365 possible birthdays, uniform distribution of birthdays across all days, and independence among the individuals' birth dates, meaning no correlations such as familial or cultural patterns influence the outcomes.[8] The counterintuitive nature of the problem arises from how quickly the probability of a shared birthday rises with group size; for instance, in a gathering of just 23 people, the chance exceeds 50%, far higher than most initial intuitions might suggest.[9] This surprise stems from the combinatorial explosion of possible pairwise comparisons, which amplifies the likelihood of at least one match even in modestly sized groups. Introduced by Austrian mathematician Richard von Mises in 1939, the problem originated as an illustration in probability theory.[10] Von Mises posed it to demonstrate practical applications of probabilistic reasoning in everyday scenarios. These foundational assumptions simplify the model by disregarding real-world factors, such as uneven birthday distributions due to seasonal birth variations or the slight increase from leap years, to focus on the core mathematical insight.[11]Exact Probability Calculation
The exact probability of at least one shared birthday among n people, assuming 365 days and uniform, independent birthdays, is derived using the complementary probability that all birthdays are distinct.[12] This complementary probability q is the product \prod_{k=1}^{n-1} \left(1 - \frac{k}{365}\right), which represents the sequential likelihood that each additional person's birthday differs from all previous ones: the second person avoids the first's birthday with probability $364/365, the third avoids the first two with $363/365, and so on up to the nth person avoiding the prior n-1 with (365 - n + 1)/365.[12] The desired probability p is then p = 1 - q.[12] Equivalently, the complementary probability can be expressed in factorial form as q = \frac{365!}{365^n (365 - n)!} for n \leq 365, reflecting the number of ways to assign distinct birthdays (a permutation of 365 days taken n at a time) divided by the total number of possible birthday assignments $365^n.[12] Thus, p = 1 - \frac{365!}{365^n (365 - n)!}.[12] This form highlights the combinatorial nature of the problem but poses significant computational challenges for even moderate n, as the factorials grow enormously large, leading to overflow in standard arithmetic computations.[13] To compute the exact value practically, the product form is preferred, often evaluated using logarithms to avoid overflow: \ln q = \sum_{k=1}^{n-1} \ln\left(1 - \frac{k}{365}\right), followed by q = e^{\ln q} and p = 1 - q.[13] For large n, while Stirling's approximation can estimate the factorials, it introduces error and is not used for exact computation; instead, high-precision libraries or iterative multiplication with scaling handle the product directly.[13] A classic example is n = 23: the complementary probability q is computed as the product from k=1 to $22 of (365 - k)/365, yielding q \approx 0.492703 (or exactly \frac{75091883268515350125426207425223147563269805908203125 - 38093904702297390785243708291056390518886454060947061}{75091883268515350125426207425223147563269805908203125}), so p \approx 0.507297.[12] This step-by-step multiplication—starting with 1, multiplying by $364/365 \approx 0.997260, then $363/365 \approx 0.994521, and continuing through 22 terms—demonstrates how the accumulating reductions lead to just over a 50% chance of a shared birthday.[12]Approximations and Bounds
Common Approximations
One common approximation for the probability P(n) that at least two people share a birthday in a group of n individuals, assuming 365 days and uniform distribution, derives from the Taylor expansion of the logarithm in the exact complementary probability. The exact probability of no shared birthdays is \prod_{k=1}^{n-1} \left(1 - \frac{k}{365}\right). Taking the natural logarithm yields \ln\left(\prod_{k=1}^{n-1} \left(1 - \frac{k}{365}\right)\right) = \sum_{k=1}^{n-1} \ln\left(1 - \frac{k}{365}\right). For small k/365, \ln(1 - x) \approx -x, so the sum approximates -\sum_{k=1}^{n-1} \frac{k}{365} = -\frac{n(n-1)}{2 \times 365}. Thus, the product approximates \exp\left(-\frac{n(n-1)}{2 \times 365}\right), and P(n) \approx 1 - \exp\left(-\frac{n(n-1)}{2 \times 365}\right).[14] This exponential approximation is equivalent to the Poisson paradigm, which models the number of shared birthday pairs as a Poisson random variable with parameter \lambda = \frac{n(n-1)}{2 \times 365}, treating pairs as rare, nearly independent events. The probability of at least one pair is then $1 - e^{-\lambda}, matching the exponential form. This Poisson approach leverages the fact that the sum of indicator variables for each possible pair having a match converges in distribution to Poisson(\lambda) under suitable conditions.[3] A useful special case is the square-root approximation for the group size n yielding P(n) \approx 0.5. Setting $1 - e^{-\lambda} = 0.5 gives \lambda = \ln 2 \approx 0.693, so \frac{n(n-1)}{2 \times 365} \approx 0.693, and solving the quadratic yields n \approx \sqrt{2 \times 365 \times \ln 2} \approx 22.99, or roughly 23. This heuristic extends generally as n \approx \sqrt{2 c \ln(1/(1-p))} for probability p and c days.[4] The exponential (or Poisson) approximation performs well for moderate n, with absolute errors typically under 0.01 for n \leq 100 in the uniform case, though it slightly underestimates the true probability as n increases due to growing dependencies among pairs. For n \leq 50, the maximum error is about 0.007 (e.g., at n=23). The Poisson distribution itself provides a slightly less precise fit for the full count of pairs at larger n, where higher moments deviate more, but remains effective for the at-least-one probability.[15] The following table compares exact probabilities P(n) (computed via the product formula) against the approximation $1 - e^{-\lambda} for selected n, illustrating the accuracy:| n | Exact P(n) | Approx. P(n) | Absolute Error |
|---|---|---|---|
| 10 | 0.11695 | 0.1164 | 0.00055 |
| 20 | 0.41144 | 0.4061 | 0.00534 |
| 23 | 0.50730 | 0.5000 | 0.00730 |
| 30 | 0.70632 | 0.6963 | 0.01002 |
| 50 | 0.97045 | 0.9652 | 0.00525 |