Where would you expect to see the mean in a box and whisker plot. In what quartile would it most likely be?


adam, Data Scientist

User avatar for adam

Well, like many things in statistics, it's all going to depend on your distribution.

If your distribution is symmetric, like the normal distribution, the true mean should be equal to the true median (in the limit). The median is marked in a box and whisker plot by black line near the center of the box (the box is the InterQuartile Range, the values between 25%-tile and the 75%-tile.) The median splits your 2nd from 3rd quartiles. If your distribution is symmetric, the mean will also be this line.

However, this is not always the case! With a skewed sample, you'll see that the mean can be a considerable ways away from the median, even outside the IQR (and therefore outside the box)!

So, for example, if you have access to the statistical package R. Let's first make a sample of 1000 draws from a uniform distribution (a symmetric distribution):

mydist <- runif(1000) quantile(mydist,c(1/4,3/4)) 25% 75% 0.2642149 0.7674930 mean(mydist) [1] 0.5109853 median(mydist) [1] 0.5074757

The mean and median are pretty close. (The difference is from sampling, and if we generated an infinite number of samples, rather than a thousand, they'd converge to the same number.)

Now, lets add some skew by inserting a single extreme value wayyyyy outside the distribution!

skewed <- append(mydist,100000) # make another observation of 100,000 when the rest are no larger than 1 quantile(skewed,c(1/4,3/4)) 25% 75% 0.2644769 0.7681941 mean(skewed) # Whoa! [1] 100.4106 median(skewed) [1] 0.5078758

So, you can see that even with a single outlier here, our median and IQR has stayed pretty much the same---which means the box looks basically the same---but now the mean is /far/ away. The mean in this case is likely even outside the whiskers area!

You can generally identify skew in a box and whisker plot by asking how asymmetric it looks. A long tail on the large side means positive or right skew, in which the mean is larger than the median. A long tail on the small side means left or negative skew, in which the mean is smaller than the median.

So in what quartile is the mean most likely to be? For symmetric distributions, it will be between 2nd and 3rd, the same as the median. For skewed distributions, you're going to have to ask a lot more questions.

Your Answer