Chebyshev's inequality
Chebyshev's inequality is a fundamental concept in probability theory that provides a way to estimate the distribution of values in a dataset relative to the mean. Specifically, it states that no more than a defined percentage, represented as \( \frac{1}{k^2} \), of the values will lie beyond a certain number of standard deviations \( k \) from the mean. This theorem is notable because it applies to any probability distribution, making it a versatile tool in statistical analysis. It is particularly useful when dealing with distributions where little information is available, as it only requires knowledge of the mean and variance.
Unlike some other statistical theories that apply exclusively to normal distributions, Chebyshev's inequality can be utilized across various types of data. For instance, it informs us that at least 75% of values will fall within two standard deviations of the mean, and about 89% will be within three standard deviations. This broad applicability is valuable in fields such as medicine, meteorology, and risk assessment, where understanding probability distributions is essential. Chebyshev's work, developed in the 19th century, remains influential in both theoretical and applied statistics today.
On this Page
Chebyshev's inequality
Chebyshev’s inequality is a theory relating to probability. It is named for Pafnuty Chebyshev, a nineteenth-century Russian mathematician (sometimes spelled Tchebysheff). It is also called the Bienaymé–Chebyshev inequality because Chebyshev’s colleague, Irénée-Jules Bienaymé, stated it without proof in 1853, more than a decade before Chebyshev proved it (1866). Probability is a branch of mathematics that calculates outcomes of an event, or the likelihood of a result. It is used in a wide variety of fields, such as medicine, in which researchers use it to predict the spread of disease; space exploration, in which scientists assess the risks of various events; and meteorology, in which scientists predict the likelihood and severity of weather events such as hurricanes.


Background
Probability emerged in the sixteenth century when scholars, including Italian polymath and gambler Gerolamo Cardano, addressed the problem of determining odds in gaming. Gambling involves betting on a future random event, such as rolling dice or choosing a card from a deck. Cardano said that the probability of an event could be defined as the ratio of the number of favorable cases from the total number of possible cases. He wrote the first book about classical probability about 1564, although it was not published until 1663. This work is largely about gaming with dice, including ways to cheat. He introduced the set of all possible cases, or circuit, and his definition of probability in this work. For example, since a die has six sides, the probability of a single face landing on top is equal to p=1/6, or probability = desired outcome/total number of outcomes. While Cardano initially believed that each roll of the die increased the chances of a particular side landing on top—for example, three rolls in theory gives one a 50 percent chance of rolling a desired side—he later recognized that logically this was untrue because six rolls did not offer a 100 percent chance of rolling a particular side. Likewise, tossing a coin has two possible outcomes—heads or tails (p=1/2)—but flipping a coin two times does not guarantee a particular outcome will result either time. He later revised his theory, counting total cases, then favorable cases, and calculating the probability.
Pafnuty Lvovich Chebyshev was born in Russia in 1821. He was a professor at St. Petersburg University. In addition to being a mathematician, he was an inventor and mechanical engineer. Among his creations was a continuous motion calculator that he first described in an 1876 report as a ten-decimal-place adding machine with a continuous tens carry. Although calculating machines had been built before, this was the first with a continuing tens carry, which he achieved by having the tens carry wheel move ten times slower than the units wheel. He also was keenly interested in engineering and machine efficiency. For example, he wrote that windmills were the most efficient means of performing work such as grinding grain and was interested in using mathematics to calculate the precise blade shape that would maximize their efficiency.
Chebyshev’s father had been a military officer. Chebyshev himself had one leg shorter than the other, resulting in a limp and ending his chances of a military career. As a child, he built mechanical toys and models and later, when he studied geometry, he understood the principles with which he was already familiar in practice. He attended Moscow University, where he began studying physics and mathematics at the age of sixteen. When he was in his early twenties, he began gaining attention for his mathematical papers. Chebyshev had many papers published in Russian and French journals—he spoke and wrote fluent French—and so became friends with many French intellectuals, including the leading mathematicians of his day. He submitted his dissertation in 1847 and became an assistant professor of mathematics at the University of St. Petersburg. He defended his doctoral thesis in 1849, became an associate professor in 1860, and a decade later became a full professor.
Overview
Chebyshev’s inequality describes the upper range of probability of a distribution, the highest chance or value. According to his theory, no more than a defined percentage of values, 1/k2, will be beyond a distance (where k means standard deviations) from the average of the distribution. As long as the mean and variance are known, the theory can be used with any probability distribution. In mathematics, an inequality is a comparison of two values showing that one value is less than, greater than, or not equal to the other value.
In the equation (see equation 1), the probability of a value (X) being more than k standard deviation (𝜎) away from the mean (𝜇) is less than or equal to 1/k2. Here k, which must have a value greater than zero, represents a distance from the mean expressed in standard deviation units. The equation can also be used in reverse when one knows the percentage of values outside a given range because one also knows the percentage of values within the range.
Some probability theories can be applied only to normally distributed data sets. For example, when normal (bell-shaped or Gaussian) distribution is graphed, it is symmetrical, with the peak in the center and data on both sides distributed equally. The empirical rule for normal distribution in statistics is 68-95-99.7, the percentage of values within an interval. In other words, about 68 percent of values fall within one standard deviation of the mean, about 95 percent of values fall within two standard deviations, and nearly all values (about 99.7 percent) fall within three standard deviations from the mean. However, Chebyshev’s theorem can be applied to any data set. This makes it valuable for use with distributions about which little is known. The required knowns are mean and variance. To apply Chebyshev’s inequality, at least 75 percent of values must be within two standard deviations of the mean and 89 percent must lie within three standard deviations.
Bibliography
Athanase Papadopoulos. “Pafnuty Chebyshev (1821 – 1894).” Bhāvanā, Apr. 2021, bhavana.org.in/pafnuty-chebyshev-1821-1894/. Accessed 3 Apr. 2023.
“Chebyshev’s Inequality.” Corporate Finance Institute, 11 Dec. 2022, corporatefinanceinstitute.com/resources/data-science/chebyshevs-inequality/. Accessed 3 Apr. 2023.
del Castillo, Joan. "Enhancing Markov and Chebyshev's Inequalities." arXiv.2308.04053, Cornell University, 8 Aug. 2023, doi.org/10.48550/arXiv.2308.04053. Accessed 13 Nov. 2024.
Goyal, Aashi. “Complete Guide to Chebyshev’s Inequality and WLLN in Statistics for Data Science.” Analytics Vidhya, 8 June 2021, www.analyticsvidhya.com/blog/2021/06/complete-guide-to-chebyshevs-inequality-and-wlln-in-statistics-for-data-science/. Accessed 3 Apr. 2023.
Karabiber, Fatih. “Chebyshev’s Inequality.” LearnDataSci, www.learndatasci.com/glossary/chebyshevs-inequality/. Accessed 3 Apr. 2023.
“Pafnuty Chebyshev—Biography, History and Inventions.” History-Computer, 21 Nov. 2022, history-computer.com/pafnuty-chebyshev-biography-history-and-inventions/. Accessed 3 Apr. 2023.
Taylor, Courtney. “What Is Chebyshev’s Inequality?” ThoughtCo., 20 Jan. 2019, www.thoughtco.com/chebyshevs-inequality-3126547. Accessed 3 Apr. 2023.
Tsitsiklis, John. “Part II: Inference & Limit Theorems: The Chebyshev Inequality.” Massachusetts Institute of Technology Open Courseware, 2018, ocw.mit.edu/courses/res-6-012-introduction-to-probability-spring-2018/resources/the-chebyshev-inequality/. Accessed 3 Apr. 2023.