Right Statistics accompanied by Wrong Charts

The only statistics you can trust are those you falsified yourself.

Today I stumbled upon an interesting post on Slashfilm that deals with the various box offices of every Harry Potter movie from 1 to 5. While the different box offices’ up and downs is cool to look at, there’s one problem in that post which urges me to write about it in more detail since I often use statistics and hence feel some sort of obligation: the proper way to create charts.

When you head over to the post on Slashfilm and look at the first chart, what do you see? You see five bars with drastically different heights.

When I looked at for the first time I was astonished by the vast box office differences between those five movies. I really thought The Prisoner Of Azkaban only made a fraction of money compared to the other movies.

Then I continued reading and took a closer look at the posted B.O. numbers below the chart.

As it turned every movie made tons of money, albeit not the exact same amount of money with a difference of around 200 million dollar from lowest to highest. The difference was high but not that drastic as the chart suggested.

So I again took a good look at the chart, inspected the vertical axis and finally spotted the big nasty problem: the Y-Values don’t start with zero but with 700 million.

And that’s a bad choice.

In fact I consider this one of the worst problems regarding statistics and charts hence I never start from another value than zero. The reason is simply the fact that you give an unintended wrong visual representation of the numbers. You visually distort the results.

Here’s my take on the same numbers and you immediately see the differences:

Worldwide Box Offices of the Harry Potter Franchise

While the original chart dramatizes the difference, my take puts everything into perspective: there are differences, but they are relatively low compared to the overall values.

Moreover my version shows another problem with the chart on Slashfilm which is why you almost never encounter this in more scientific papers: 3D.

3D might be a cool looking visual effect but in fact it adds another level of subtle visual distortion. When you closely look at the third bar in the middle of the Slashfilm chart, you’ll notice that the B.O. number must be somewhere near 790 Million Dollar. From the looks of it I’d say 785 Million. But in fact it’s 795 Million. So from the looks alone a normal reader is off by 10 million.

The last bar of that chart is the same, only the other way around where you instinctively add 10 million dollar. The problem at hand here are the borders. We look at the border in the front; the program instead uses the border in the back.

3D is one big trap you can fall into when putting visuals over content so it’s something you should avoid at any cost.

So, in short there are two things you should do at any given time when working with charts:

  1. Always start your vertical axis with a value of zero when you deal with numbers
  2. Don’t do 3D

Call me a nitpicker but in this context I gladly am. (But that’s only me…)

Comments

  • Jason

    Awesome graph, thank you.

    • Gunther Heinrich

      Thanks and you’re welcome :)

Leave a reply