The Laws of Averages: Part 2, A Beam of Darkness

By Kip Hansen – Re-Blogged From http://www.WattsUpWithThat.com

This essay is second in a series of essays about Averages — their use and misuse.  My interest is in the logical and scientific errors, the informational errors, that can result from what I have playfully coined “The Laws of Averages”.

Averages

As both the word and the concept “average” are subject to a great deal of confusion and misunderstanding in the general public and both word and concept have seen an overwhelming amount of “loose usage” even in scientific circles, not excluding peer-reviewed journal articles and scientific press releases,  I gave a refresher on Averages in Part 1 of this series.  If your maths or science background is near the great American average, I suggest you take a quick look at the primer in Part 1 before reading here.

A Beam of Darkness Into the Light

The purpose of presenting different views of any data set — any collection of information or measurements about a thing, a class of things, or a physical phenomenon — is to allow us to see that information from different intellectual and scientific angles — to give us better insight into the subject of our studies, hopefully leading to a better understanding.

Modern statistical [software] packages allow even high school students to perform sophisticated statistical tests of data sets and to manipulate and view the data in myriad ways.  In a broad general sense, the availability of these software packages now allows students and researchers to make [often unfounded] claims for their data  by using statistical methods to arrive at numerical results — all without understanding either the methods or the  true significance or meaning  of the results.  I learned this by judging High School Science Fairs and later reading the claims made in many peer-reviewed journals.  One of the currently hotly discussed controversies is the prevalence of using “P-values” to prove that trivial results are somehow significant because “that’s what P-values less than 0.05 do”.  At the High School Science Fair, students were including ANOVA test results about their data –none of them could explain what ANOVA was or how it applied to their experiments.

Modern graphics tools allow all sorts of graphical methods of displaying numbers and their relationships.   The US Census Bureau has a whole section of visualizations and visualization tools. An online commercial service, Plotly,  can create a very impressive array of visualizations of your data in seconds.  They have a level of free service that has been more than adequate for almost all of my uses [and a truly incredible collection of possibilities for businesses and professionals at a rate of about a dollar a day].  RAWGraphs has a similar free service.

The complex computer programs used to create metrics like Global Average Land and Sea Temperature or Global Average Sea Level are believed by their creators and promoters to actually produce a single-number answer, an average, accurate to hundredths or thousandths of a degree or fractional millimeters.  Or, if not actual quantitatively accurate values,  at least accurate anomalies or valid trends are claimed.  Opinions vary wildly on the value, validity, accuracy and precision of these global averages.

Averages are just one of a vast array of different ways to look at the values in a data set.  As I have shown in the primer on averages, there are three primary types of averages  — Mean, Median, and Mode — as well as a number of more exotic types.

In Part 1 of this series, I explained the pitfalls of averages of heterogeneous, incommensurable objects or data about objects.  Such attempts end up with Fruit Salad, an average of Apples-and-Oranges:  illogical or unscientific results, with meanings that are illusive, imaginary, or so narrow as not to be very useful.  Such averages are often imbued by their creators with significance — meaning — that they do not have.

As the purpose of looking at data in different ways — such as looking at a Mean, a Median, or a Mode of the numerical data set — is to lead to a better understanding, it is important to understand what actually happens when numerical results are averaged and in what ways they lead to improved understanding and in what ways they lead to reduced understanding.

A Simple Example:

Let’s consider the height of the boys in Mrs. Larsen’s hypothetical 6th Grade class at an all boys school.  We want to know their heights in order to place a horizontal chin-up bar between two strong upright beams for them to exercise on (or as mild constructive punishment — “Jonny — ten chin-ups, if you please!”).  The boys should be able to reach it easily by jumping up a bit so that when hanging by their hands their feet don’t touch the floor.

The Nurse’s Office supplies the heights of the boys, which are averaged to get the arithmetical mean of 65 inches.

Using the generally accepted body part ratios we do quick math to approximate the needed bar height in inches:

Height/2.3 = Arm length (shoulder to fingertips)

65/2.3 = 28 (approximate arm length)

65 + 28 = 93 inches = 7.75 feet or 236 cm

Our calculated bar height fits nicely in a classroom with 8.5 foot ceilings, so we are good.   Or are we?  Do we have enough information from our calculation of the Mean Height?

Let’s check by looking at a bar graph of all the heights of all the boys:

This visualization, like our calculated average, gives us another way to look at the information, the data on the heights of boys in the class.  Realizing that because the boys range from just five feet tall (60 inches) all the way to almost 6 feet (71 inches) we will not be able to make one bar height that is ideal for all.  However, we see now that 82% of the boys are within 3 inches either way of the Mean Height and our calculated bar height will do fine for them.  The 3 shortest boys may need a little step to stand on to reach the bar, and the 5 tallest boys may have to bend their legs a bit to do chin ups.  So we are good to go.

But when they tried the same approach in Mr. Jones’ class, they had a problem.

There are 66 boys in this class and their Average Height (mean) is also 65 inches, but the heights had a different distribution:

Mr. Jones’ class has a different ethnic mix which results in an uneven distribution, much less centered around the mean.  Using the same Mean +/- 3 inches (light blue) used in our previous example, we capture only 60% of the boys instead of 82%.  In Mr. Jones class,  26 of the 66 boys would not find the horizontal bar set at 93 inches convenient.  For this class, the solution was a variable height bar with two settings:  one for the boys 60-65 inches tall (32 boys), one for the boys 66-72 inches tall (34 boys).

For Mr. Jones’ class, the average height, the Mean Height, did not serve to illuminate the information about boys’ height to allow us to have a better understanding.   We needed a closer look at the information to see our way through to the better solution.  The variable height bar works well for Mrs. Larsen’s class as well, with the lower setting good for 25 boys and the higher setting good for 21 boys.

Combining the data from both classes gives us this chart:

This little example is meant to illustrate that while averages, like our Mean Height, serve well in some circumstances, they do not do so in others.

In Mr. Jones’ class, the larger number of shorter boys was obscured, hidden, covered-up, averaged-out by relying on the Mean Height to inform us of the best solutions for the horizontal chin-up bar.

It is worth noting that Mrs. Larsen’s class, shown in the first bar chart above, has a distribution of heights that more closely mirrors what is called a Normal Distribution, a graph of which looks like this:

Most of the values are creating a hump in the middle and falling off evenly, more or less, in both directions.    Averages are good estimations of data sets that look like this if one is careful to use a range on either side of the Mean.    Means are not so good for data sets like Mr. Jones’ class, or for the combination of the two classes.  Note that the Arithmetical Mean is exactly the same for all three data sets of height of boys  — the two classes and the combined — but the distributions are quite different and lead to different conclusions.

US Median Household Income

A very common measure of economic well-being in the United States is the US Census Bureau’s annual US Median Household Income.

First note that it is given as a MEDIAN — which means that there should be an equal number of families above this income as families below this income level.  Here is the chart that the political party currently in power — regardless of whether it is the Democrats or the Republicans — with both the Oval Office (US President)  and both houses of Congress in their pocket, will trot out:

That’s the Good News! graph.   Median Family Income on a nice steady rise through the years, we’re all singing along with the Fab Four “I’ve got to admit it’s getting better, A little better all the time…

This next graph is the Not So Good News graph:

The time axis is shortened to 1985 to 2015, but we see that families have not been gaining much, if at all, in Real Dollars, adjusted for inflation, since about 1998.

And then there is the Reality graph:

Despite the Good News! appeal of the first graph, and the so-so news of the second, we see that if we dig below the surface, looking  at more than just the single-numeral Median Household Income by year, we see a different story — a story obscured by both the Good News and the Not So Good News.  This graph is MEAN Household Income of the five quintiles of income, plus the Top 5%,  so the numbers are a bit different and it tells a different story.

Breaking the population into five parts (quintiles), the five brightly colored lines, the bottom-earning 60% of families, the green, brown and red lines,  have made virtually no real improvement in real dollars since 1967.  The second quintile,  the middle/upper middle classes in purple, have seen a moderate increase.  Only the top 20% of families (blue line) have made solid  steady improvement — and when we break out the Top 5%, the dashed black line, we see that not only do they earn the lion’s share of the dollars, but  they have benefited from the lion’s share of the percentage gains.

Where are the benefits felt?

Above is what the national average, the US Median Household Income metric, tells us.  Looking a bit closer we see:

Besides some surprises, like Minnesota and North Dakota, it is what we might suspect.  The NE US: NY, Massachusetts, Connecticut, NJ, Maryland, Virginia, Delaware — all coming in at the highest levels, along with California, Washington.  Utah has always had the more affluent Latter-Day Saints and along with Wyoming and Colorado has become a retirement destination for the wealthy.  The states whose abbreviations are circled have state averages very near the national median.

Let’s zoom in:

The darker green counties have the highest Median Household Incomes.  It is easy to see San Francisco/Silicon Valley in the west and the Washington DC-to-NYC- to-Boston megapolis in the east.

This map answered my big question:  How does North Dakota have such a high Median Income?  Answer:  It is one area, circled and marked “?”, centered by Williams County, with Williston as the main city.  The area has less than 10,000 families.  And “Williston sits atop the Bakken formation, which by the end of 2012 was predicted to be producing more oil than any other site in the United States,” it  is the site of Americas latest oil boom.

Where is the big money?  Mostly in the big cities:

And where is it not?  All those light yellow counties  are areas in which many to most of the families live at or below the federal Poverty Line for families of four.

An overlay of US Indian reservations reveals that they are, in the west particularly, in the lowest and second lowest income brackets. (An interest of mine, as my father and his 10 brothers and sisters were born on the Pine Ridge in southwestern South Dakota, the red oval.)   One finds much of the old south in the lowest bracket (light yellow), and the deserts of New Mexico and West Texas and the hills of West Virginia and Kentucky.

One more graphic:

What does this tell us?

It tells us that looking at the National Median Household Income, as a single-number–especially in dollars unadjusted for inflation–presents a picture that obscures, hides, whitewashes over the inequalities and disparities that are the important facts of this metric.   The single number, National Average  (Median) Household Income number tells us only that one very narrow bit of information — it does not tell us how American families are doing income-wise.  It does not inform us of the economic well-being of American families  — rather it hides the true state of affairs.

Thus, I say that the publicly offered Average Household Income, rather than shedding light on the economic well-being of American families, literally shines a Beam of Darkness that hides the real significant data about the income of America’s households.   If we allow ourselves to be blinded by the Beam of Darkness that these sort of truth-hiding averages represent, then we are failing in our duty as critical thinkers.

Does this all mean that averages are bad?

No, of course not.  They are just one way of looking at a batch of numerical data.  The are not, however, always the best way.  In fact, unless the data one is considering is very nearly normally distributed and changes are caused by known and understood  mechanisms, averages of all kinds more often lead us astray and obscure the data we should really be looking at.   Averages are a lazy man’s shortcut and seldom lead to a better understanding.

The major logical and cognitive fault is allowing one’s understanding to be swayed, one’s mind to be made up, by looking at just this one very narrow view of the data — one absolutely must recognize that the view offered by any type of average is hiding and obscuring all the other information available, and may not be truly representative of the overall, big picture.

Many better methods of data analysis exist, like the simplistic bar chart used in the school boys’ example above.  For simple numerical data sets, charts and graphs, if used to reveal (instead of hide) information are often appropriate.

Like averages, visualizations of data sets can be used for good or ill  — the propaganda uses of data visualizations, which now include PowerPoints and videos, are legion.

Beware of those wielding averages like clubs or truncheons to form public opinion.

And climate?

The very definition of climate is that it is an average — “the weather conditions prevailing in an area in general or over a long period.”  There is no single “climate metric” — no single metric that tells us what “climate” is doing.

By this definition above, pulled at random from the internet via Google, there is no Earth Climate — climate is always “the weather conditions prevailing in an area in general or over a long period of time”.   The Earth is not a climatic area or climate region, the Earth has climate regions but is not one itself.

As discussed in Part 1 — the objects in sets to be averaged must be homogeneous and not so heterogeneous as to be incommensurable.  Thus, when discussing the climate of a four-season region, generalities are made about the seasons to represent the climatic conditions in that region during the summer, winter, spring and fall, separately.  A single average daytime temperature is not a useful piece of information to summertime tourists if the average is taken for the whole year including the winter days — such an average temperature is foolishness from a pragmatic point of view.

Is it also foolishness from a Climate Science point of view?  This topic will be covered in Part 3 of this series.   I’ll read your comments below — let me know what you think.

Bottom Line:

It is not enough to correctly mathematically calculate the average of a data set.

It is not enough to be able to defend the methods your Team uses to calculate the [more-often-abused-than-not] Global Averages of data sets.

Even if these averages are of homogeneous data and objects, physically and logically correct, averages return a single number and can incorrectly be assumed to be a summary or fair representation of the whole set.

Averages, in any and all cases, by their very nature, give only a very narrow view of the information in a data set — and if accepted as representational of the whole, will act as a Beam of Darkness, hiding  and obscuring the bulk of the information;   thus,  instead of leading us to a better understanding,  they can act to reduce our understanding of the subject under study.

Averages are good tools but, like hammers or saws, must be used correctly to produce beneficial and useful results. The misuse of averages reduces rather than betters understanding.