Scatterplots
Scatterplots are a graphical representation used in data analysis to illustrate the relationship between two paired variables on a Cartesian plane. By plotting individual data points, scatterplots help identify patterns, correlations, and potential outliers within the data. These visual tools are foundational in exploratory data analysis, allowing researchers to fit both linear and nonlinear functions through techniques such as regression analysis. Historically, the concept of scatterplots dates back to the 17th century with the development of the Cartesian plane by René Descartes, while the term "scatter plot" emerged in the early 20th century. Notable early contributors to the use of scatterplots include Francis Galton, who explored anthropometric data and introduced the concept of regression towards the mean. In contemporary practice, scatterplots can be enhanced with computer software, enabling three-dimensional visualizations and matrix plots for comprehensive data exploration. However, it's essential to approach scatterplots with caution, as misinterpretations can arise when assuming causation from observed correlations. Overall, scatterplots are vital tools in various fields, from mathematics and science to economics and social research, facilitating a deeper understanding of complex data relationships.
Scatterplots
Summary: Scatterplots are useful tools for mathematicians and statisticians to graph and present data.
Human beings are constantly exploring the world around them to discover relationships that can be used to explain past and current events or phenomena and perhaps to predict future occurrences.
![This scatterplot displays a correlation of r=-.76 By Online Statistics Education: A Multimedia Course of Study Project Leader: David M. Lane, Rice University (onlinestatbook.com guessing correlations demo) [Public domain], via Wikimedia Commons 94982042-91572.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/94982042-91572.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
The colloquial expression “a picture is worth a thousand words” is traced back to many possible historical sources, including French leader and noted student of mathematics Napoleon Bonaparte, who purportedly said, “A good sketch is better than a long speech.” In the twenty-first century, graphing is a fundamental first step in any exploratory data analysis, and graphical representations are common in the media. Scatterplots, which most often represent values of paired variables in a Cartesian plane, help data investigators identify relationships, describe patterns and correlation, fit linear and nonlinear functions using techniques like regression analysis, and locate points known as “outliers” that deviate from the predominant pattern. In the primary grades, students often use line graphs, which some consider to be a special case of scatterplots, while scatterplots for data may be explored beginning in the middle grades in both mathematics and science classes.
Early History
Mathematicians and others have long sought alternative methods of representation for researching, presenting, and connecting the mathematical concepts they studied. The Cartesian plane, named for René Descartes, facilitated graphing of algebraic equations and data beginning in the seventeenth century. Historians have traced scatterplots to 1686, though the term “scatter diagram” is attributed to early twentieth-century researchers such as statistician Karl Pearson, and “scatterplot” seems to have first appeared in a 1939 dictionary.
Examples of early pioneers of data graphing include “political arithmetician” Augustus Crome, who studied the relationships between nations’ population sizes, land areas, and wealth; mathematician and sociologist Adolphe Quetelet, who conducted studies of body measurements that helped contribute to the measure now known as the Body Mass Index, which relates height and weight; and engineer and political scientist William Playfair, who called himself the “inventor of linear arithmetic,” a term he used for graphs. He said: “. . . it gives a simple, accurate, and permanent idea, by giving form and shape to a number of separate ideas, which are otherwise abstract and unconnected.” Playfair’s eighteenth-century graphical summaries of British trade across various years are perhaps the earliest example of what would now be referred to as “time series plots” (or in some cases “line graphs”), which may be considered a special case of scatterplots.

While Playfair plotted many economic variables as functions of time, the most extensive early use of scatterplots to relate two observed variables is probably the anthropometric and genetic research of Francis Galton, a cousin of scientist Charles Darwin. After studying medicine and mathematics in college, he became interested in the investigation and characterization of variability and deviations in many natural phenomena. He established a laboratory for the measurement and study of human mental and physical traits, focusing on empirical and statistical studies of heredity in the latter half of the nineteenth century. Many of Galton’s scatterplots involved graphing parental characteristics on one axis, usually the X, and offspring characteristics on the other. Like scientist Gregor Mendel, some of his initial genetic experiments were conducted on peas; later, he investigated measurements of people. Scatterplots of height appeared in his 1886 publication Regression Towards Mediocrity in Hereditary Stature, which is the origination of the name for the statistical technique of regression analysis. The word “mediocrity” in this context was a reference to the mean or average height (not a qualitative judgment) and was used to describe a pattern observed in the data: very short parents tend to have taller children, and very tall parents tend to have shorter children, in both cases closer to the mean.
Recent Developments
Prior to the development of computers and data analytic software, data had to be graphed by hand. In the twenty-first century, computers facilitate many types of scatterplots. In addition to the standard plots of two variables in the Cartesian plane, there are three-dimensional scatterplots that display point clouds to explore the ways in which three variables relate and interact. Symbols used to represent points on a two- or three-dimensional scatterplot may also be coded using different colors or shapes to indicate additional variables and uncover patterns. Matrix plots are square grids of scatterplots for a set of variables that plot all possible pairwise sets, usually arranged such that all of the plots in the same row share the same Y variable and all plots in the same column share the same X variable. Mathematicians, statisticians, computer scientists, and other types of researchers have explored the theoretical and methodological links between scatterplots and map surfaces for use in applications such as data mining and spatial analysis of geospatial information system (GIS) data.
While they are useful tools for exploration and representation, scatterplots are often subject to misinterpretations. For example, sometimes relationships or correlations shown in scatterplots are mistakenly taken as evidence of cause and effect, which must be inferred from the way in which the data were collected rather than from the strength of the association.
Bibliography
Few, Stephen. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Oakland, CA: Analytics Press, 2009.
Friendly, M., and D. Denis. “The Early Origins and Development of the Scatterplot.” Journal of the History of the Behavioral Sciences 41, no. 2 (2005).
Stigler, Stephen. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Belknap Press of Harvard University Press, 1990.