What’s the probability of running into someone by chance in London?

Anyi Guo
4 min readJun 22, 2022

--

Learn to use the Binomial distribution to model real life events, with worked examples in Python 💻

Mr. Bayes during his commute in the London tube

You work and live in London. What’s the probability that you’ll run into someone you know by chance?

Let’s assume that you know 500 people in London and the population of London is 8 million people. The probability of running into someone on any day is 0.006%, which is calculated by simply dividing the two numbers:

p is the probability of running into someone you know on any given day

In this example, running into someone can be modelled as a Bernoulli trial where Success means you did run into someone on a given day, and Failure means that you didn’t. This also means that we can use the Binomial distribution to estimate the probability of running into someone in London in 1, 2 and 5 years.

Probability of running into someone once in London in 1, 2 and 5 years

We can use the binom.pmf function from scipy library to calculate the probability of running into someone exactly N times in a certain number of trials. In this case, we have 3 parameters for the binomial distribution:

  • p = probability of running into someone you know on any given day
  • n = number of days, which will be 365 for 1 year, 730 for 2 years etc. This is also called number of trials
  • n_successes = number of times running into someone. It is set to 1 in this example

We can plug all these numbers into the binom.pmf function which returns the probability mass function with the given parameters. It seems that there is only a 2.2% chance for running into someone you know once in London in a year, and a 4.3% chance in 2 years. This only increased to 10.2% for 5 years. London is indeed a big city!

Python code for calculating the probability of running into someone once in London in 1–5 years

For those of you looking for a shortcut, you can also use this Binomial Distribution Calculator, plug in the parameters and get the same results:

But wait, I’ve run into people way more often than that…

Some of you may be wondering the probability is so low given that you feel you run into folks more often than that. This can be mainly due to 2 reasons, both having to do with p , the probability of running into someone on any day.

The formula for p again, for reference. p = probability of running into someone you know on any given day
  • 1. You know more than 500 people in London. This means that you have a higher probability p because the numerator is bigger.
  • 2. The people you know mostly live in some areas of London. Since it is the population of the specific areas instead of all of London that we are using as the denominator for p, we’ll get a higher value of p .

East London

Let’s take an example: imagine that you grew up in East London where you have most of your friends and families. Out of the 500 people you know, 350 of them live in East London. How likely is it for you to run into someone in East London?

Example of running into someone in East London.

Using the same code as before, we can see that now the probability of running into someone once in a year is now at 5.9%, which is much higher than the 2.2% that we got for all of London. This shows that you are much more likely to run into someone in an area that you’ve lived in, which is anecdotally true for most people 😃

If you want to learn more about the Binomial distribution, check out this article: Bernoulli vs. Binomial: What’s the difference?

--

--

Anyi Guo
Anyi Guo

Written by Anyi Guo

Head of Data Science @ UW. This is my notepad for thoughts on data science, machine learning & AI.

Responses (1)