Member-only story

Variance, Covariance and Correlation: What’s the difference?

Anyi Guo
6 min readApr 5, 2022

--

A guide to understand these statistical measures, with worked examples using cats 😺

Measuring Mr. Bayes for this post. It was difficult to get him to sit still.

Imagine that you are working on a study that looks at the relationship between cat’s breeds and their physiques. You collect data from 50 cats, and save their weight, body length, gender and breed info into a spreadsheet. Now you’d like to summarise the average weight and body length of the cats, as well as how they differ based on the cats’ breeds. In statistics, the latter is called spread or dispersion, and the most commonly used metrics to quantify spread are variance, covariance and correlation.

Let’s start with the easiest one: Variance

Variance

Variance measures how far from the mean (average) individual data point(s) is. In our example, we can use variance to describe how much cats’ weights vary depending on their breed or gender. A high variance tells us that the values in our sample are far from their mean, while a low variance indicates that values are closely clustered around the mean.

Variance is always positive, and is used to describe dispersion of one variable.

How to calculate Variance

Imagine that you collected the weight data from 5 male and 5 female cats. You can…

--

--

Anyi Guo
Anyi Guo

Written by Anyi Guo

Head of Data Science @ UW. This is my notepad for thoughts on data science, machine learning & AI.

Responses (5)