You work and live in London. What’s the probability that you’ll run into someone you know by chance?
Let’s assume that you know 500 people in London and the population of London is 8 million people. The probability of running into someone on any day is 0.006%, which is calculated by simply dividing the two numbers:
In this example, running into someone can be modelled as a Bernoulli trial where Success means you did run into someone on a given day, and Failure means that you didn’t. This also means that we can use the Binomial distribution to estimate the probability of running into someone in London in 1, 2 and 5 years.
Probability of running into someone once in London in 1, 2 and 5 years
We can use the binom.pmf
function from scipy
library to calculate the probability of running into someone exactly N times in a certain number of trials. In this case, we have 3 parameters for the binomial distribution:
p
= probability of running into someone you know on any given dayn
= number of days, which will be 365 for 1 year, 730 for 2 years etc. This is also callednumber of trials
n_successes
= number of times running into someone. It is set to1
in this example
We can plug all these numbers into the binom.pmf
function which returns the probability mass function with the given parameters. It seems that there is only a 2.2%
chance for running into someone you know once in London in a year, and a 4.3%
chance in 2 years. This only increased to 10.2%
for 5 years. London is indeed a big city!
For those of you looking for a shortcut, you can also use this Binomial Distribution Calculator, plug in the parameters and get the same results:
But wait, I’ve run into people way more often than that…
Some of you may be wondering the probability is so low given that you feel you run into folks more often than that. This can be mainly due to 2 reasons, both having to do with p
, the probability of running into someone on any day.
- 1. You know more than 500 people in London. This means that you have a higher probability
p
because the numerator is bigger. - 2. The people you know mostly live in some areas of London. Since it is the population of the specific areas instead of all of London that we are using as the denominator for
p
, we’ll get a higher value ofp
.
East London
Let’s take an example: imagine that you grew up in East London where you have most of your friends and families. Out of the 500 people you know, 350 of them live in East London. How likely is it for you to run into someone in East London?
Using the same code as before, we can see that now the probability of running into someone once in a year is now at 5.9%
, which is much higher than the 2.2%
that we got for all of London. This shows that you are much more likely to run into someone in an area that you’ve lived in, which is anecdotally true for most people 😃
If you want to learn more about the Binomial distribution, check out this article: Bernoulli vs. Binomial: What’s the difference?
If you are interested in learning more about maths and statistics, you can check out the other posts in this series:
- Variance, Covariance and Correlation: What’s the difference?
- Geometric Mean vs. Arithmetic Mean: What’s the difference?
- Linear Regression vs. Generalized Linear Models (GLM): What’s the difference?
- Linear, Exponential vs. Quadratic Functions: What’s the difference?
- Pearson vs. Spearman Correlation: What’s the difference?
- Probability vs. Likelihood: What’s the difference?