Stem-and-Leaf Plots
A stem-and-leaf plot is a graphical representation used to display the distribution of a data set, allowing for an effective visualization during exploratory data analysis. This method organizes data points by separating each value into a "stem" (the leading digit or digits) and a "leaf" (the final digit), making it easier to identify patterns and trends within the data. The concept originated with British statistician Sir Arthur Lyon Bowley and later gained prominence through John Wilder Tukey’s work on exploratory data analysis.
Stem-and-leaf plots are commonly introduced alongside other statistical tools such as histograms, pie charts, and dot diagrams, and are widely featured in introductory statistics textbooks. For example, if we analyze gasoline prices from different locations, a stem-and-leaf plot can help us compare price distributions visually. By focusing on the last two digits of the prices, users can discern how data from different sets, such as east-side and north-side gasoline prices, are distributed — revealing insights like whether the data is evenly spread or skewed.
This visualization technique is particularly useful for small to moderate-sized data sets, as it retains the original data values while summarizing the distribution characteristics.
On this Page
Subject Terms
Stem-and-Leaf Plots
A stem-and-leaf plot is a diagram portraying the distribution of a set of data. It is an effective tool to visualize a set of data in exploratory data analysis.
Overview
Historically, the stem-and-leaf plot originated from the ideas of the British statistician Sir Arthur Lyon Bowley (1869–1957). It gained popularity in applications after the work of John Wilder Tukey (1915–2000) on exploratory data analysis. Today, the stem-and-leaf plot, typically introduced with histograms, pie charts, and dot diagrams, is documented in most introductory statistics textbooks.
Consider an example analyzing gasoline prices. Suppose that the prices of the nine gasoline stations in the east side of a city are {$3.81, $3.72, $3.71, $3.62, $3.59, $3.74, $3.53, $3.81, $3.63} and the prices in the north side (on the same day) are {$3.73, $3.78, $3.82, $3.54, $3.89, $3.84, $3.86}. At first glance, it seems there is not much difference between the two data sets. However, we can analyze the price distributions of the two data sets. See Figure 1.
Notice that all the prices in the two data sets are in the three dollar range. Thus, to compare the two sets of prices is to compare the final two digits of the prices, and we move our attention to the cent values. For the east side, the data become {81, 72, 71, 62, 59, 74, 53, 81, 63} and the data for the north side become {73, 78, 82, 54, 89, 84, 86}. A stem-and-leaf plot presents this data graphically.
The two plots show that the eastside gasoline prices are almost evenly distributed, while the prices in the north side are skewed to the upper $3.70 to $3.80 range.
Bibliography
Tukey, J. W. Exploratory Data Analysis: Past, Present, and Future. Technical Report No. 302 (Series 2), Department of Statistics. Princeton: Princeton UP, 1993.
Utt, Jessica M., and Robert F. Heckard. Mind on Statistics. Stamford, CT: Cengage, 2015.
Wall, Jennifer J., and Christine C. Benson. "So Many Graphs, So Little Time." Mathematics Teaching in the Middle School 15.2 (2009): 82-91.