Statistical Dispersion
Statistical dispersion is a key concept in statistics that measures the extent to which data points in a dataset differ from one another. It provides insight into the variability or spread of values, and several measures can quantify this dispersion, including variance, standard deviation, and interquartile range. Variance assesses the average of the squared differences between each data point and the mean, while the standard deviation, the square root of variance, conveys how much individual values deviate from the mean in the same units as the original data. When calculating these measures, particularly for sample data, adjustments like using \( n-1 \) instead of \( n \) in the denominator help achieve more accurate estimates of population parameters.
Statistical dispersion is particularly relevant when analyzing skewed data; in such cases, the interquartile range—which focuses on the middle 50% of data points—may serve as a more reliable measure than standard deviation. Understanding these concepts is essential for interpreting data distributions in various fields, allowing for better decision-making based on statistical analysis.
On this Page
Subject Terms
Statistical Dispersion
Statistical dispersion is a term referring to how large or small the range of values is for a particular variable; there are many different measures of dispersion, including variance, standard deviation, and interquartile range.
The concept of dispersion includes that of scatter, or variability. It can be measured by the extent of differences each datum value has from each of the other values. In any data set of reasonable size, this would lead to very many differences to be calculated, so instead it is more straightforward to calculate the difference of each value from a central value, namely the mean. Say that the sample data that are in hand are x1, x2, ... xn and that the mean of all these data is
. It is possible to calculate what is called the deviation from the mean:
’ where i = 1, 2’...,n. It might be thought that the next step would be to work out the average of these deviations, but because some will be negative and others positive, it turns out that this average would be zero. Square the deviations to get rid of the negative values and take the average of the results. This leads to a well-known quantity called the variance. This is a well-used and popular measure of dispersion. Similarly, take the positive square root of the variance to obtain the standard deviation.
This is done because the units of the variable of interest are squared in the case of the variance. Taking the square root will return the units to that of the variable of interest. The formulae for data originating as a sample that are not grouped in any way are:
Note that when dealing with samples, the denominator (n – 1), not n, gives a better estimate of the population standard deviation in the long run. (The population standard deviation is calculated with n in the denominator.) The standard deviation does not change despite any change in the origin. It does not change even after addition or subtraction of each datum value by some constant. So standard deviations of 2, 3, 7, 9 are the same as 3, 4, 8, 10, adding a constant of 1. If, however, each datum value is divided by 3, the standard deviation is likewise divided by 3.
Mean(x + y) = mean(x) + mean(y). Variance or SD works differently. The SD of the sum or difference of the values is not the sum or difference of the SDs. In a specific situation when x and y are linearly independent, variance(x + y) = variance(x) + variance(y). In general, however, variance(x + y) = variance(x) + variance(y) + 2 × covariance(x, y), where covariance(x, y) =
)/(n – 1). This is the sum of product of deviations and is applicable when each value of x has correspondingly one value of y. With x – y, covariance becomes negative.
In the case of skewed data, standard deviation is unlikely to be the best measure of dispersion. Here, use the inter-quartile range, covering the middle 50% of values.
Bibliography
Chambers, Robert G., et al. "A Two-Parameter Model of Dispersion Aversion." Journal of Economic Theory 150 (2014). Print.
Manikandan, S. "Measures of Dispersion." Journal of Pharmacological Pharmacotherapeutics 2. 4 (2011). Print.
Moore, P.G. Principles of Statistical Techniques. Cambridge: Cambridge UP, 2010. Print.
Sundaram, K. R., S. N. Dwivedi, V. Sreenivas. Medical Statistics Principles and Methods. New Delhi: B.I., 2010. Print.