Misuse of Statistics

Abstract

Without an understanding of the purpose and limitations of statistical tools, even the most well-intentioned person can easily misuse statistics to support a conclusion that is not valid. Both descriptive and inferential statistics are open to misuse if one is not careful. However, an understanding of what various statistical tools can and cannot do, what assumptions need to be met when using them, and how to appropriately interpret the results of statistical tests can enable one to learn what questions to ask when presented with statistical findings, become a better consumer of statistical information, and be less prone to succumb to the allure of misused statistics.

Overview

By definition, science requires the application of the scientific method, in which observations of the real world are turned into testable hypotheses, data are collected and analyzed, and conclusions are drawn based on these results. Hypothesis testing and the concomitant use of statistical tools is the way that any science is advanced and theories are validated or changed. However, the presentation of graphs, charts, or numbers derived from arcane formulae alone is not enough to "prove" whether a hypothesis is correct. Unless one understands the limitations of such statistical tools and how to interpret them, it can be easy for even the most well-intentioned person to misuse statistics to support a conclusion that is not valid. At best, statistics give estimates: scientific gambles, as it were, that one's interpretation of observed behavior approximates the actual underlying causes.

Unfortunately, for many people, the use of statistics seems to throw an aura of arcane acceptability over whatever conclusion they are attached to. We are much more likely to believe a conclusion supported by charts, graphs, or numbers than we are to believe the same conclusions if they are unsupported. "Our company has a combined experience of 112 years" sounds so much more venerable than "We have lots of experience," and "80% of students fear taking a statistics course" is more scientific than "Lots of students hate statistics." But the truth is, unless we know where these numbers come from, we do not know what they really mean. The 112 years of experience may actually be the combined ages of the president, vice president, and treasurer of the organization; the 80% of students may refer to a sample drawn from a group of art majors rather than math majors.

Admittedly, the proper use of inferential statistical tools requires training. However, even deceptively simple descriptive statistical techniques can be misused. In most cases, such situations arise due to a lack of understanding of the nature and limitations of the various statistical tools on the part of the person presenting the statistics. In a few cases, however, the person reporting the statistics may actually be trying to mislead the reader. Fortunately, even a little understanding about the nature of statistics can go a long way in helping one be a better informed reader of scientific reports, research studies, and even the daily newspaper. When armed with an understanding of what various statistical tools can and cannot do, what assumptions need to be met when using them, and how to appropriately interpret the results, one can learn what questions to ask when presented with statistical findings, become a better consumer of statistical information, and be less prone to succumb to the allure of misused statistics.

Applications

Misuse of Descriptive Statistics. Descriptive statistics can appear to be deceptively simple. Most people learn the basics of calculating a mean and preparing graphs and charts before they reach high school. Newspaper articles, television advertisements, and professional journals all present data summarized by descriptive statistics. However, descriptive statistics cannot be used to draw inferences about or make predictions from a sample of data. The purpose of descriptive statistical techniques is merely to organize and summarize data. Further, one must be careful about how data are displayed using graphical methods so that the data are not misrepresented. One type of misuse of statistics that is commonly seen is shown in the two graphs in Figure 1. Both graphs present the same data. However, the graph on top is designed so that it unfairly magnifies the differences in quarterly income for the four quarters, while the graph on the bottom is drawn to scale, showing that in actuality there is little difference between the quarterly earnings for the four quarters.

ors-soc-1322-126661.jpg

Advertisements and articles in news media and other publications frequently are illustrated with graphs and statistics from which conclusions are drawn. However, as illustrated above, these data can be misleading, due either to a poor understanding of descriptive statistics or to an intentional attempt to mislead. Therefore, one needs to take into account the type of descriptive statistics used and understand how the shape of a distribution can distort its meaning.

Descriptive Statistics. Descriptive statistics is a subset of mathematical statistics that describes and summarizes data. Included under this umbrella are various methods for summarizing and presenting data so that they are more easily understood, such as graphs, charts, distributions; measures of central tendency that estimate the midpoint of a distribution, such as mean, median, and mode; and measures of variability that summarize how widely dispersed data are in a distribution, such as range, semi-interquartile deviation, and standard deviation. These tools are deceptively simple; in truth, descriptive statistics are misused every day. For example, the three measures of central tendency are all ways to determine the "average" of a distribution of scores. It would be easy to assume that since they are all methods for finding the average, they must be interchangeable. This, however, is not true. Each different approach to determining central tendency has different characteristics from the others and is influenced by different things.

If the underlying distribution were a perfect normal distribution, these three techniques would all yield the same result. However, real-world data are messy, and underlying distributions are virtually never a perfect bell-shaped curve. Yet often only the "average" is reported, with no indication as to whether it is the mean, median, or mode, so that the reader has no idea how the measure may have been affected. For example, in a skewed distribution, where one end has extreme outliers but the data is otherwise normally distributed, the median may be pulled toward the skew (i.e., toward the end with the outliers). Because of this, when the ends are not balanced and data are clustered toward one end of the distribution, the median may disproportionately reflect the outlying data points. On the other hand, if the extreme ends are balanced (i.e., not skewed), the median is not affected. The mean is also affected by extreme scores, and in a skewed distribution it tends to be pulled even more toward the skew than the median. These tendencies can make significant differences in the resulting values of central tendency. For example, if the mode were used to report the "average" salary for a given career and it was found that most of the people in that occupation only made $30,000 per year, it would give a different impression than if the statistic reported were the mean, which is pulled in the direction of the skew. As shown in Figure 2, the average salary for this hypothetical distribution is much closer to the mode than it is to the mean because of the small proportion of people who earn significantly more than the rest.

ors-soc-1322-126662.jpg

These are not the only differences between these three measures of central tendency. Although the mode is quick and easy to calculate, it also has the disadvantages of lacking stability (i.e., a small change in the numbers can lead to a great change in the mode), not taking the score values into account, and not being valuable for any purpose other than to state which number has the highest frequency. The median is more stable and is the preferred measure of central tendency for use in non-symmetrical distributions because it is not as affected by extreme scores as the other two measures. The mean has advantages in most situations over the other two measures of central tendency: it is highly stable (its value does not vary greatly because of a change in one or a small handful of scores), and it is the basis of many inferential statistical techniques.

Misuse of Inferential Statistics. Inferential statistics are equally easy to misuse without a proper understanding of their limitations. Every semester, there is always at least one eager student in one of my classes who presents his or her research findings and proudly declares that the statistical results "prove" that the original hypothesis was correct. However, the truth is that statistics do not prove anything. Rather, they merely express probabilities and the degree of confidence with which one can say that the hypothesis being tested is more likely to be true than the alternative hypothesis. This fact is frequently shown in the literature, when one set of scientists attempts to replicate the research of other scientists and finds to their dismay (or, in some cases, delight) that the original results cannot be replicated.

To understand why this occurs, one needs to understand the influence of probability on statistics. In general, statistics are used to test the probability of the null hypothesis (H0) being true. The null hypothesis is the statement that there is no statistical difference between the status quo and the experimental condition. If the null hypothesis is true, then the treatment or characteristic being studied made no difference on the end result. For example, a null hypothesis might state that people are treated no differently in the workplace when they wear a business suit than they are when they wear casual clothing. The alternative hypothesis (H1) would state that the way people dress actually does have an effect on the way they are treated in the workplace.

Accepting the null hypothesis means that if the data in the population are normally distributed, the results are more than likely due to chance. This is represented in Figure 3 as the unshaded portion of the distribution. By accepting the null hypothesis, the analyst concludes that it is likely that people do not react any differently to people wearing business suits than they do to those wearing casual clothing. For the null hypothesis to be rejected and the alternative hypothesis to be accepted, the results must lie in the shaded portion of the graph. This would mean that there is a statistically significant likelihood that the observed difference between the way the two groups are treated is probably due not to chance but to a real underlying difference in people's reactions to how others dress in the workplace. Statistical significance is the degree to which an observed outcome is unlikely to have occurred due merely to chance.

ors-soc-1322-126663.jpg

Another reason that statistics are sometimes misused is that not every statistical technique is appropriate for use in every situation. For example, some techniques assume that the samples that are being analyzed are not dependent, whereas other techniques do not make this assumption. A researcher needs to be careful to pick the technique that is most appropriate for the data being analyzed. In addition, sometimes researchers with less expertise in statistics prefer to use multiple simple statistical tests rather than a more comprehensive but complicated test. They may perform multiple t-tests rather than an analysis of variance, or multiple analyses of variance rather than one multivariate analysis of variance. However, one of the implications of the laws of probability is that the more tests that are run on a single set of data, the more probable it is that spuriously significant results will occur merely by chance. This approach is often referred to as "shotgunning." Conclusions drawn based on the results of such analyses are suspect at best, because this approach can compound the error inherent in the data and lead to false results.

In addition, no matter what type of inferential statistical technique is being used, one needs to look at the underlying assumptions of that technique to determine whether or not it is appropriate for what one is trying to do. Many of the inferential statistics that are commonly used (e.g., t-tests, analyses of variance, Pearson product moment correlation coefficients) are parametric and make certain assumptions about the parameters of the data being analyzed and the distribution of the underlying population from which a sample is drawn, including the assumption that the data have been randomly selected from a population with a normal distribution. Further, parametric statistics require data that are interval or ratio in nature. This means that the rank orders of the data have meaning (e.g., a value of 6 is greater than a value of 5), as do the intervals between the values. However, real-world data do not always meet these assumptions. For example, although one knows exactly what the difference is between 96 grams of a chemical compound and 95 grams of the same compound, it is less clear what the difference between a score of 96 and a score of 95 on an attitude survey means. To attempt to use parametric statistics in a nonparametric situation is to run the risk of producing misleading results.

Fortunately, in situations where data do not meet the assumptions of parametric statistics, one need not either rely on the misuse of parametric statistics or forgo statistical analysis completely. A number of nonparametric procedures are available that correspond to common tests used when the shape and parameters of a distribution are known. Nonparametric tests make no assumptions about the underlying distribution. Although they are not as powerful as standard parametric statistics, they do allow the analyst to derive meaningful information from a less-than-perfect data set. For instance, in cases where a target population has not been fully represented or subjects have dropped out of an experiment, statistical adjustments such as weighting can be used to retain validity by compensating for such imbalances (Mercer, Kreuter, Keeter, & Stuart, 2017).

Finally, statistics advances the state of science slowly. Not all answers can be found in one research study. For example, one statistic in particular that is frequently misinterpreted is the coefficient of correlation. The purpose of this inferential statistic is to determine the degree to which values of one variable are associated with values of another variable. For example, one could generally say with assurance that weight gain in the first year of life is positively correlated with age (i.e., the older the baby is, the more it is likely to weigh). However, this same correlation would not apply to most adults, as heavier adults are not necessarily older than lighter adults. Correlation only shows the relationship between the two variables; it does not explain why the relationship occurs or what caused it. Two events may be highly correlated but caused by a third factor. For example, two clocks that keep perfect time always chime at the same time. Neither causes the other to chime; rather, it is the movement of the mechanisms over time itself that causes the clocks to chime.

In a classic example of the misuse of correlation, Neyman once gave an illustration of the correlation between the number of storks and the number of human births in various European countries (1952). Someone not understanding how to interpret the correlation coefficient might conclude from this evidence that storks bring babies. The truth, however, was that the original calculation did not take into account the size of the countries in the data set. Larger countries tend to have both more women and more storks. The storks did not bring the babies, just as living in a larger country does not increase the probability of having a baby. The correlation was incidental, not causal.

Conclusion

The use of statistics is an important part of any science. A wide variety of techniques are available to those who desire to summarize large amounts of data or to make inferences and predictions about a larger underlying population based on observations of a sample. However, in order for the statistics to be meaningful, care must be taken to understand both their potential and their limitations. Both descriptive statistics and inferential statistics are open to abuse and misuse, with the result that the user may reach a conclusion unsupported by the data. By understanding what various statistical tools can and cannot do, what assumptions need to be met when using them, and how to appropriately interpret the results, one can learn what questions to ask when presented with statistical findings. Such knowledge helps both professionals and interested laypeople alike become better consumers of statistical information.

Terms & Concepts

Descriptive Statistics: A subset of mathematical statistics that describes and summarizes data.

Distribution: A set of numbers collected from data and their associated frequencies.

Inferential Statistics: A subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences, such as drawing conclusions about a population from a sample, as well as in decision making.

Measures of Central Tendency: Descriptive statistics that are used to estimate the midpoint of a distribution. Measures of central tendency include the median (the number in the middle of the distribution), the mode (the number occurring most often in the distribution), and the mean (a mathematically derived measure in which the sum of all data in the distribution is divided by the number of data points).

Measures of Variability: Descriptive statistics that summarize how widely dispersed the data are over the distribution. Measures of variability include the range (the difference between the highest and lowest data points) and the standard deviation (a mathematically derived index of the degree to which data points differ from the mean of the distribution).

Nonparametric Statistics: A class of statistical procedures that are used when it is not possible to estimate or test the values of the parameters of the distribution or when the shape of the underlying distribution is unknown.

Normal Distribution: A continuous distribution that is symmetrical about its mean and asymptotic to the horizontal axis. The area under the normal distribution is 1. The normal distribution is also called the Gaussian distribution or the normal curve.

Null Hypothesis: The statement that the findings of the experiment will show no statistical difference between the control condition and the experimental condition.

Parametric Statistics: A class of statistical procedures that are used when it is reasonable to make certain assumptions about the underlying distribution of the data and the values to be analyzed are either interval- or ratio-level data.

Population: The entire group of subjects belonging to a certain category, such as all women between the ages of eighteen and twenty-seven, all dry-cleaning businesses, or all college students.

Quartile: Any of three points that divide an ordered distribution into four equal parts, each of which contains one quarter of the data points.

Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that it will reflect the characteristics of the larger population.

Skewed Distribution: A distribution that is not symmetrical around the mean (i.e., there are more data points on one side of the mean than there are on the other).

Statistics: A branch of mathematics that deals with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data are analyzed to find answers to quantifiable questions. Applied statistics uses these techniques to solve real-world problems.

Bibliography

Armore, S. J. (1966.) Introduction to statistical analysis and inferences for psychology and education. New York: John Wiley & Sons.

Hollander, M. & Wolfe, D. A. (1973). Nonparametric statistical methods. New York: John Wiley & Sons.

Huff, D. (1954). How to lie with statistics. New York: W. W. Norton & Company.

Mansell, W. (2013). Misleading the public understanding of assessment: Wilful or wrongful interpretation by government and media. Oxford Review of Education, 39(1), 128–138. Retrieved November 8, 2013, from EBSCO Online Database Academic Search Complete. http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=85751059&site=ehost-live

Mercer, A. W., Kreuter, F., Keeter, S., & Stuart, E. A. (2017). Theory and practice in nonprobability surveys: Parallels between causal inference and survey inference. Public Opinion Quarterly, 81, 250–271. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=123082527&site=ehost-live&scope=site

Neyman, J. (1952). Lectures and Conferences on Mathematical Statistics and Probability (2nd ed.). US Department of Agriculture: Washington DC.

Prewitt, K. (2012). When you have a hammer... . Du Bois Review: Social Science Research on Race, 9(2), 281–301. Retrieved November 8, 2013, from EBSCO Online Database SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=84359609&site=ehost-live

Procheş, Ş. (2016). Descriptive statistics in research and teaching: Are we losing the middle ground? Quality & Quantity, 50(5), 2165–2174. Retrieved October 24, 2018, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=117379765&site=ehost-live&scope=site

Scheff, T. (2011). The catastrophe of scientism in social/behavioral science. Contemporary Sociology, 40(3), 264–268. Retrieved November 8, 2013, from EBSCO Online Database SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=69671015&site=ehost-live

Witte, R. S. (1980). Statistics. New York: Holt, Rinehart and Winston.

Suggested Reading

Adams, A., & Lawrence, E. K. (2019). Research methods, statistics, and applications (2nd ed.). Thousand Oaks, CA: SAGE.

Gardenier, J. (2012). Recommendations for describing statistical studies and results in general readership science and engineering journals. Science & Engineering Ethics, 18(4), 651–662. Retrieved November 8, 2013, from EBSCO Online Database Academic Search Complete. http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=83839288&site=ehost-live

Gravetter, F. J. & Wallnau, L B. (2006). Statistics for the behavioral sciences. Belmont, CA: Wadsworth/Thomson Learning.

Hogben, L. (1957). Statistical theory: The relationship of probability, credibility and error. London: George Allen & Unwin.

Keller, D. K. (2006). The Tao of statistics: A path to understanding (with no math). Thousand Oaks, CA: Sage Publications.

Young, R. K. & Veldman, D. J. (1977). Introductory statistics for the behavioral sciences (3rd ed.). New York: Holt, Rinehart and Winston.

Essay by Ruth A. Wienclaw, PhD

Dr. Ruth A. Wienclaw holds a PhD in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human-systems integration.