Measures of center

Summary: Mode, median, and various averages (including the arithmetic mean and weighted average) are all examples of deriving a central value.

Mathematician and anthropometry pioneer Adolphe Quetelet is sometimes called the “father of the average man.” His nineteenth-century work A Treatise on Man and the Development of His Faculties outlined theories regarding distributions of human traits. Whereas others before him had applied the normal distribution to describe measurement errors, Quetelet asserted that human traits, both physical and intellectual, were normally distributed around some central value. In his later work, the “average man” was sometimes presented as an ideal human, a concept that mathematicians, such as Antoine Cournot, disputed. Nonetheless, notions of a central value representing a typical case within a set of observations became very influential in research and statistical data analysis. There are many ways to think about center or typical values.

One of the most common measures is the arithmetic mean, often simply known as “average,” which some trace back to Pythagorean writings on properties of music. Other types of averages include harmonic, geometric, trimmed, weighted, and moving or rolling means. Measures of center besides the mean include median and mode. Statistician Frank Zizek stated in his 1913 book Statistical Averages:

An average may be computed for its own sake, merely to obtain a comprehensive characteristic expression for a series of divergent values, but it is often found as a means to another end, mainly for purposes of comparison.…

Students in twenty-first-century classrooms may use measures of center in the primary grades, focusing on mode and median, while mean is introduced in the middle grades. Expected value, which is the long-term average for a random variable or process, is a probability concept most commonly addressed in high school or college.

Mode

In the nineteenth century, psychologist and physicist Gustav Fechner studied the nonlinear relationships between subjective psychological sensations and the actual physical intensity of different stimuli, a field now known as “psychophysics.” Use of the mode as a measure of center occurs in Fechner’s work, which appears to be the first mention in print. He defined it as the value “around which the items… collect most densely, so that equal intervals contain more items the nearer the intervals lie to this value.” Later in the same century, Karl Pearson would use a probabilistic and graphical approach to the definition, stating that the mode was the “abscissa corresponding to the ordinate of maximum frequency.” Consistent with the probabilistic approaches used by both Fechner and Pearson, mode has come to be defined as the most frequently occurring value in a probability distribution or set of data. It is the only measure of center that is appropriate for both categorical and numerical variables because it does not require the data to be ordered in any way.

Median

Fechner’s work also contained reference to medians, which he called the “middlemost ordinate” or “centralwerth” of an ordered series of values or data points. Some credit Carl Friedrich Gauss for “inventing” the median earlier as part of his work on the normal distribution. The name “median” is attributed to Francis Galton in the late nineteenth century. Inspired by Quetelet, Galton researched ways to measure and express center and variation in data, both numerically and graphically. He devised the “ogive graph,” named after a curve common in architecture and ballistics, which graphed data versus ranks. His method of “statistics by intercomparison” used quantiles and percentiles, including the median, to consider deviations. Galton’s median represented a typical value, which he termed “mediocrity,” often assigning it a standardized value of zero as a point of reference for comparisons. Subsequently, many nonparametric (also called “distribution-free”) statistical methods based on medians were developed by mathematicians and statisticians. Some of these procedures are named for them, including Henry Mann, William Kruskal, W. Allen Wallis, Donald Whitney, and Frank Wilcoxon.

Mean

Though the exact age of either mode or median is unknown, available evidence suggests that the mean may be older. In the Pythagorean treatise, On Music, from the school named for Pythagoras of Samos, there is some discussion of finding the middle value of two data points, such that the value exceeds the lower value by the same amount that the upper value exceeds the middle. While this basic description could be either the mean or the median for the case of two points, some historians consider this to be evidence of Pythagorean use of the mean. Statistician and historian Robin Plackett examined evidence from Babylonia, Egypt, and Greece and concluded that, while the mean may have been used in selected cases, it did not appear to be standard practice among astronomers and others who were typically collecting data. He credits sixteenth-century astronomer Tycho Brahe with introducing the mean into scientific methods of the times.

Mathematician Thomas Simpson showed in the eighteenth century that an average was a better measure than a single observation in a very limited set of cases and astronomers often used probability and means to quantify errors of deviations in observations. Other mathematicians, such as Joseph Lagrange, Abraham de Moivre, Pierre-Simon Laplace, and Carl Friedrich Gauss, contributed to mathematical developments that addressed the mean of a probability distribution or data set in the eighteenth and nineteenth centuries, while Quetelet, Galton, and others sought novel applications of measures of center. Statisticians in the twentieth century continued work on means, including George Box and Gwilym Jenkins, whose research about moving averages is the basis for many time-series forecasting models, and new research is ongoing into the twenty-first century. The mean has many mathematical properties that make it more desirable for widespread use than the median, such as connections to the least squares criterion and the method of moments. Many statistical techniques are concerned with estimating and comparing means.

Rules and generalizations have been devised and taught over the years regarding which measure of center is best to use for any given set of data, particularly with regard to choosing between the mean and median. Mathematically, the arithmetic mean minimizes the sum of squared distances of all points from the center, while the median minimizes the sum of absolute distances. For data with perfect symmetry, these are equivalent. Data with skew or outliers may yield very different outcomes. Mode is also less clear as a measure of center if there are no repeated values or if there are two or more values that occur most frequently. In the twenty-first century, educators like sociologist and statistician Paul von Hippel continue to investigate methods to teach concepts and relationships between mean, median, mode, and skew.

Bibliography

Stigler, Stephen. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Belknap Press of Harvard University Press, 1990.

Von Hippel, Paul. “Mean, Median, and Skew: Correcting a Textbook Rule.” Journal of Statistics Education 13, no. 2 (2005). http://www.amstat.org/publications/jse/v13n2/vonhippel.html.