Chi-squared Distribution
The chi-squared distribution is a family of continuous probability distribution functions widely used in statistical hypothesis testing across various fields, including mathematics, science, and social science. It plays a crucial role in tests like the chi-square test of independence, which assesses the association between two categorical variables, and the chi-square goodness-of-fit test, used to determine if a chosen population model properly fits observed data. The distribution is generated by squaring and summing independent standard normal distributions, with its shape influenced by a parameter known as degrees of freedom (df). As the degrees of freedom increase, the distribution approaches a bell-shaped curve.
Originating from the need to evaluate the fit of statistical models to data, the chi-squared distribution was shaped by contributions from several researchers, notably Karl Pearson, who formalized its application in 1900. This distribution is also related to the gamma distribution and has variations like the generalized and noncentral chi-squared distributions, which find uses in fields such as medical imaging and wireless communication. The mean and variance of the chi-squared distribution are directly linked to its degrees of freedom, aiding researchers in validating models and testing hypotheses effectively.
On this Page
Subject Terms
Chi-squared Distribution
Researchers in mathematics, science, social science, and a variety of other fields use the chi-squared distribution for statistical hypothesis tests, such as the chi-square test of independence, which determines whether or not two categorical variables are associated, and the chi-square goodness-of-fit test, which helps determine whether a chosen population model is a good fit for an observed set of data. It is also used in applications such as signal processing. The chi-squared distribution is a family of continuous probability distribution functions (pdfs), created by squaring and adding together one or more independent standard normal distributions—a particular variety of bell curve. The function’s shape (see Figure 1) is determined by a parameter called its degrees of freedom (df). For large df values (approximately fifty and above), the chi-squared distribution itself is almost bell-shaped.
The number of degrees of freedom is determined by the total number of standard normal distributions that were added together. Standard statistical notation uses the Greek letter chi: (2(df). For example, (2(7) is a chi-squared distribution with seven degrees of freedom. The mean of any chi-squared distribution is equal to its number of degrees of freedom and its variance is two times its number of degrees of freedom.
Overview
The development of the chi-squared distribution resulted from mathematicians and scientists trying to solve the problem of whether a particular model, such as the normal distribution, was a good fit for their observed data. For example, were measurement errors made by astronomers really bell-shaped? This was important to assess the accuracy of measurements taken by different scientists, especially when data were combined to locate objects or derive astronomical constants.
Mathematician Pierre-Simon Laplace and physicist Auguste Bravais were among those whose research on errors in the nineteenth century contributed to the chi-squared distribution. Geodesist Friedrich Helmert derived the distribution in 1876, as part of his research on the distribution of gravity on the surface of the earth. However, as was often the case with people working independently, Helmert’s contribution was not well-known until later. Instead, the distribution was named by English statistician Karl Pearson, who published an article on his chi-squared test of goodness of fit in 1900. He used data from experiments on flipping coins and spinning roulette wheels to determine whether they produced the expected distributions of heads and roulette numbers. Although his paper contained errors, some of which were later corrected by statistician Ronald Fisher, it was a significant achievement in statistics and motivated research on goodness-of-fit testing and chi-squared properties.
Chi-square goodness-of-fit testing and other chi-square tests are used to validate models and test hypotheses. The F distribution, commonly used by researchers in many disciplines for hypothesis testing, is the ratio of two chi squares. Chi squared is special case of the gamma distribution, which is used in applications like modeling insurance claims, rainfall, and genomics. The generalized and noncentral chi-squared distributions are variations of the chi-squared distribution that are useful in applications such as medical imaging, multi-antenna wireless communication, and power analysis of statistical tests.
Bibliography
"Chi-Square Distribution." NIST/SEMATECH e-Handbook of Statistical Methods. National Institute of Standards and Technology, 2013. Web. 10 December 2014. <http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm/>
Forbes, Catherine, Merran Evans, Nicholas Hastings, and Brian Peacock. Statistical Distributions. Hoboken, NJ: Wiley, 2011. Print.
Freedman, David, Robert Pisani, and Roger Purves. Statistics. 4th ed. London: Norton, 2011. Print.
Glantz, Stanton A. Primer of Biostatistics. 7th ed. New York,: McGraw, 2011. Print.
Moore, David, and William I. Notz. Statistics: Concepts and Controversies. 8th ed. New York: Freeman, 2012. Print.
Salsburg, David. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: Freeman, 2001. Print.