Box Plots

A box plot is a type of graph commonly used to explore important features of numerical data. It is based on a set of statistical measures known as the five-number summary, which includes the minimum, first quartile, median (second quartile), third quartile, and maximum. The five-number summary divides data into four groups, each of which contains about 25% of the data. A box plot displays the locations and the relative distances between the five-number summary values. This helps identify the center of the data, the overall spread (range) of the data, the spread of the middle groups of values in the data (interquartile range), and whether or not the data values are symmetric versus skewed to the left or right. Another important function of boxplots is to help identify outliers, which are values in a data set that are unusually large or small compared to the rest of the set. The most common rule uses the distance between the first quartile and third quartile to compute fences that divide outliers from the rest of the data. Summarizing center, spread, and symmetry, as well as finding outliers, are important for describing data and determining whether certain other statistical procedures are appropriate.

Overview

Mathematician John Tukey is typically credited with inventing boxplots as we know them today, though prototypes can be found earlier in the twentieth century. Boxplots became widely known when Tukey included them in his 1977 book Exploratory Data Analysis. In that text, he introduced both skeletal plots, which are boxplots that show only the five-number summary, and schematic plots, which are boxplots that also display outliers. Today, a distinction often is not made between the two. The term box-and-whisker plot is another name for boxplots. This comes from the fact that the middle 50% of the data in a typical boxplot is represented by a rectangle or "box" while the smallest and largest 25% of the data are represented by lines or "whiskers" extending from the box. Outliers are individually marked with separate symbols, such as asterisks.

In the days before the widespread availability of computers, Tukey and others were motivated by the need to develop useful graphics that were easy to draw by hand. That way, data analysts could quickly visualize and explore important features of their data and make comparisons, such as analyzing the results of an experiment with two or more groups. Today software and even graphing calculators include options for making boxplots, and they are used by researchers in many fields as well as students from middle school through college. For students, the emphasis is often on making comparisons and recognizing variability, and some assert that boxplots should be used more often in popular media as well as research for this purpose. At various times, people have proposed boxplots that include more than the five number summary. Different group sample sizes can be shown by using varying box widths, and statistical uncertainty about the data’s center can be indicated by notches in the boxes around the medians.

Bibliography

Dawson, Robert. "How Significant Is a Boxplot Outlier?" Journal of Statistics Education 19.2 (2011): n.p. Web. 11 November 2014. <http://www.amstat.org/publications/jse/v19n2/dawson.pdf/>.

Lewandowski, Sara C., and Sara E. Bolt. "Box-and-Whisker Plot." Encyclopedia of Research Design. Ed. Neil J. Salkind. New York: Sage, 2010.

Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison, 1977. Print.

Utt, Jessica M. See Through Statistics. Stamford, CT: Cengage, 2015. Print.

Utt, Jessica M., and Robert F. Heckard. Mind on Statistics. Stamford, CT: Cengage, 2015. Print.

Wall, Jennifer J., and Christine C. Benson. "So Many Graphs, So Little Time." Mathematics Teaching in the Middle School 15.2 (2009): 82-91. Print.