Stats 101: Mean versus Median

As we are exposed to more and more data and statistics on a daily basis, it has become even more important to understand how to interpret that data – and how to spot when they are being manipulated.  This series is designed to explain the common ways that data and statistics are used and to arm you with the information you need to ask the right questions and understand the true meaning behind all of those numbers.

Determining the difference between the terms average, mean and median is the subject of this blog. 

What is average?

The word 'average' is used so commonly that we don’t really even question what it means when we see it.  But dismissing the use of averages can be a mistake.  Generally when we hear the word 'average' we associate it with the term 'mean' – or when we add up all of the values in a data set then divide by the number of values.  Oftentimes, our assumption will be correct. However, the term 'mean' and 'average' are not necessarily interchangeable.

The term 'average' is generally used to express that something is statistically the norm, that is, what value we expect, is in the middle, or common.  Technically, the term 'average' includes a few ways to measure what value best characterizes a particular sample, including mean and median.  This common misconception leaves us vulnerable to manipulation by those who would use it to make a data set look favorable to their cause. So, it is important to ask what someone really means when they use this term.

Mean versus Median

The term 'median' is also a common term used when describing a data set, especially among number crunchers.  To calculate a median value, we look at the midpoint of a data set, or the point in a set of values where the number of values above and below are equal.   For example, in the Data Set A, value 4 is the midpoint, with 3 values above and 3 values below.

We use both median and mean to give us a sense of what the 'average' or middle of a group looks like.  It may be the case that the median and mean of a given set of data are very similar.  Take the data set above, for example.  The median value is 12 and the mean is 13 – not a huge difference.  In other cases, however, the difference between the two calculations can be significant.  It all comes down to how evenly the values in your dataset are distributed.

When values are evenly distributed, the mean will be similar to the median.  Say, for example, your town has a population of 1,000 and everyone makes exactly $50,000 a year.  The midpoint is the same as the mean - $50,000.  Bill Gates hears how nice your town is and decides to move there.  The next year, Mr. Gates sees some tough times and only earns a billion dollars.  Suddenly, your little town has a mean income of over $1 million. The median remains $50,000.  If you only looked at the mean income of your town, your first thought might be: “Wow!  My town is full of millionaires!” But that figure is clearly not representative.  Having one billionaire in the town has skewed the distribution of incomes, making a mean calculation less useful. 

Real world example:

According to the Congressional Budget Office, mean real household income[1] grew 62 percent between 1979 and 2007.

Sounds good, right?  However, note that mean is being used here. 

When we use median (where half of all households have income below the median, and half have income above it) to measure income growth, the picture changes. Between 1979 and 2007, real household income grew by only 35 percent.  The difference between 62 percent growth and 35 percent growth is pretty significant. 

The reason for the big difference? Incomes are not evenly distributed across the population.  Traditionally when looking at income distribution you have a few people at the bottom, a lot of people in the middle and a few people at the top. This is where we get the term “middle class.”

For the past thirty years, the same is also true about the growth in income – it has not been evenly distributed.   Because the mean (62 percent) is so much larger than the median (35 percent), this tells us that the distribution of income growth is skewed significantly toward the higher end.  That is, a small portion of the population has seen disproportionally larger gains in their income than everyone else. 

QUESTIONS TO ASK:

  • If someone uses the term 'average,' what calculation are they actually referring to?
  • If someone uses the term 'mean' when describing a data set, how evenly distributed is the data set? That is, does the data set have a few extremely high or extremely low values?
  • If the answer is "yes" or “I don’t know” to the question above, would median be a more appropriate choice? 

When in doubt, ask for both median and mean – if they are significantly different, that tells you that your data set is probably skewed in one direction or another and could affect your decisions. 




[1]
inflation-adjusted, after-tax (measured after government transfers and federal taxes) household income

Tags: