When drawing items from a pool with replacement, the probability that you’ll draw the same item more than once is much higher than intuition would suggest.
Often called the Birthday Paradox and framed thus: in a group of just 23 people, there is a greater than 50% chance that at least two people will share a birthday.11 Assuming birthdays are evenly distributed, which they’re not.
Let be the number of items we draw with replacement from the pool with items. is the probability that at least one item is drawn more than once.
birthday(n, k) := 1 - ( k! * binomial(n, k) / n ^ k);
The numbers get unweildy if we’re trying to work out collision probabilities for, say, a random identifier with 80 bits. We can approximate with:
birthday_approx(n, k) := 1 - %e ^ (k ^ 2 / (2 * n));
To approximate , how many items need to be drawn so that the probability of a collision is :
collision_k(p, n) := sqrt( 2 * n * log( 1 / ( 1 - p) ) );