Introductory Data Science using R
R Exercise: The birthday problem
In a room of 23 people, what is the probability that at least two people share the same birthday?
Let’s count
First, some assumptions:
- There are only 365 days in a year
- Every day is equally likely to be a birthday
- Everyone’s birthday is independent of each other
Strategy: It’s easier to figure out the probability of the complementary event. $$P(A) = 1 - P(A^c)$$What’s the complement?
- Let $A$ = At least two people share the same birthday
- Then $A^c$ = Nobody shares any birthday (all birthdays are different)
- Label the individuals from $1,\dots,23$
- How many possible birthdays can person 1 have? 365 out of 365
- How many possible birthdays can person 2 have? 364 out of 365
- …
What’s the complement?
- Since all events are independent,
$$P(A^c) = \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{365-23+1}{365}$$
$$= \frac{365!}{(365-23)!365^{23}}$$
- Thus,
$$P(A) = 1 - \frac{365!}{(365-23)!365^{23}}$$
Logarithms
Factorials are often too large to compute and can cause memory overflow.
Adopt the alternative formula
$$P(A) = 1 - \exp \big\{ \log(365!) - \log((365-23)!) $$
$$- 23 \log 365 \big\}$$
Write this in R
Functions that you need:
factorial()
to compute factorialslfactorial()
to compute log factorialsexp()
to compute exponentials
New question
In a room of $x$ people, what is the probability that at least two people share the same birthday?
Write this in R
Write a function that takes a positive integer x
and returns the probability that at least two people share the same birthday.
BONUS: Plot it!