Skip to Content

Drawing the Same Item More Than Once is Surprisingly Likely

When drawing items from a pool with replacement, the probability that you’ll draw the same item more than once is much higher than intuition would suggest.

Often called the Birthday Paradox and framed thus: in a group of just 23 people, there is a greater than 50% chance that at least two people will share a birthday.11 Assuming birthdays are evenly distributed, which they’re not.

1 Assuming birthdays are evenly distributed, which they’re not.

Math

Let kk be the number of items we draw with replacement from the pool with nn items. P(n,k)P(n, k) is the probability that at least one item is drawn more than once.

P(n,k)={1nPknkkn1k>nP(n, k)=\begin{cases} 1 - \frac{_{n}P_{k}}{n^{k}} &k \leq n \\ 1 &k \gt n \end{cases}

In maxima:

birthday(n, k) := 1 - ( k! * binomial(n, k) / n ^ k);

Approximations

The numbers get unweildy if we’re trying to work out collision probabilities for, say, a random identifier with 80 bits. We can approximate P(n,k)P(n,k) with:

P(n,k)1en22dP(n, k) \approx 1 - e^{ \frac{ -n^2 } { 2d } }
birthday_approx(n, k) := 1 - %e ^ (k ^ 2 / (2 * n));

To approximate k(p,n)k(p,n), how many items need to be drawn so that the probability of a collision is pp:

k(p,n)2nln11pk(p, n) \approx \sqrt{ 2n \ln{ \frac{ 1 }{ 1 - p } } }
collision_k(p, n) := sqrt( 2 * n * log( 1 / ( 1 - p) ) );