Probability

Probability is a mathematical discipline that quantifies uncertainty by analyzing events that cannot be predicted with certainty. It involves defining a set of possible outcomes, known as the probability space, associated with an experiment or action, where outcomes are required to be mutually exclusive and exhaustive. For example, when flipping a coin, the potential outcomes can be categorized based on observations of interest, such as the number of times heads appears. Probability distributions describe how likely these outcomes are, with different types depending on whether the outcomes are discrete (like rolling dice) or continuous (like measuring time on a clock).

In real-world applications, probabilistic modeling is often used to derive probability distributions from observed data, which can include scenarios ranging from gambling to insurance calculations. Such models can help estimate risks and inform decisions, though they rely on underlying assumptions that must be carefully considered. Historically, the study of probability has evolved from practical needs in gambling and insurance to a rigorous mathematical framework, significantly influenced by mathematicians like Pierre-Simon Laplace and Emile Borel. Overall, probability provides essential tools for understanding and managing uncertainty in various fields, including science, economics, and social sciences.

Published in: 2024

By: Johnson Jr., Peter D.

Subject Terms

Probability

Type of physical science: Mathematical methods
Field of study: Probability and statistics

Which side of a tossed coin will land up? Will Flight 742 crash and burn? How many cars will pass a certain point along a highway between 2 p.m. and 3 p.m. on Tuesday? What percentage of a certain array of plutonium atoms will decay in a six-hour period? These questions belong to the realm of probability because no one can give single answers to them with absolute certainty of correctness.

Overview

Probability is the mathematical science that deals with events that cannot be foreseen with certainty or determined by available information. Every probabilistic situation involves some sort of happening or action or experiment that has a set of (possible) outcomes, which is called the probability or sample space associated with the experiment. Different observers may associate different sets of outcomes to the same experiment, depending on their interests and concerns. In any case, the outcomes included in the set of outcomes are required to be mutually exclusive and exhaustive.

For example, suppose a coin with sides labeled "heads" and "tails" is flipped twice. If one's main concern is with the number of times the heads side comes up, the set consisting of the three outcomes "heads did not come up," "heads came up exactly once," and "heads came up both times" would be a reasonable set of outcomes to settle upon; so would the set of four outcomes, S = {HH, HT, TH, TT}, in which, for example, HT stands for "heads came up on the first flip, and tails on the second." Any set obtained from either of these by omitting one of the outcomes in the set would be unacceptable as a set of outcomes because it would not be exhaustive. The set {Ho, H1, H2}, with Hk = "heads came up at least k times," is exhaustive, but the Hk are not mutually exclusive, so this set is unacceptable as a set of outcomes.

In each probabilistic situation, the set of outcomes is thought to be equipped with a probability distribution that describes how likely the various outcomes are to occur whenever the associated experiment or action takes place. Sometimes, this probability distribution can be derived precisely from assumptions about the situation. For example, in the case of a coin being flipped twice, if the coin is fair and unbiased (meaning heads and tails are equally likely to come up on each flip), then there is a reasonable argument to the effect that the outcomes HH, HT, TH, TT, mentioned above, are equally probable. In this case (and, more generally, whenever the set of outcomes is finite), the probability distribution takes the form of a probability assignment; each of the four outcomes is assigned a probability of 1/4 so that the outcomes have equal probability, and their probabilities add up to one. (The requirement that the assigned probabilities add up to one is an important normalizing convention that is observed throughout probability theory.)

In many cases, the probability distribution is not determined a priori. For example, if the coin that is flipped twice is biased in some unknown way, then the probability assignment is unknown. To take an instance of a kind that is interesting to insurance companies, suppose that a certain individual wishes to purchase five-year term life insurance. The insurance will be paid for by five equal premiums, each to be paid at the beginning of a policy year. If, for example, the insured individual dies during the second year of the policy, then the insurance company will pay the policy amount to the beneficiary and will not receive the last three premiums.

The "experiment" in this case consists of observing the insured during the policy period to see if death occurs. Since it makes a difference to the company in which of the five years the insured dies, if at all, there are six possible outcomes of interest (one of which is "the insured survives the policy period"). The insurance company would like very much to know the true probability assignment of these outcomes, for the purpose of calculating (by a process that is beyond the scope of this article) a safe and reasonable premium. There is no way, however, to determine the true probability assignment exactly and certainly. The problem of approximating the correct probability assignment in such a case belongs to a branch of statistics called actuarial science. The actuarial method approximates the probability assignment by comparing information about the insurance applicant with mortality statistics. It is worthwhile to emphasize the obvious: The correct probability assignment, and its approximation, will vary with the information that is available about the applicant for insurance. A ninety-six-year-old man with emphysema will pay a much higher premium than a twenty-eight-year-old cosmetics supply salesman in apparently perfect health.

In many (but not all) cases, when the set of outcomes is infinite, it makes no sense to assign a positive probability to each of the possible outcomes. For example, suppose that the probabilistic experiment involves throwing a dart at a dartboard in the shape of a disk. If the possibilities that the dart might miss or bounce off the disk are disregarded, the set of outcomes is identifiable with the set of points of the disk. (Let the point associated with each legal throw be the center of the small hole made by the dart.) In this situation, the probability distribution takes the form of a probability measure, which assigns to each reasonably defined subset of the set of outcomes--that is, to each region on the dart board--the probability that the dart will land in that region. For example, in the case of a supposedly terrible dart player, it may be judged that all parts of the board are equally likely to be hit. (That is, when this player throws, a point is chosen at random from the board.) In such a case, the probability measure would assign to each region the ratio of the area of the region to the area of the whole disk. In the case of a good dart player who is aiming at the center of the disk, the measure will have a subtler definition; the measure of a region will depend on the area of the region, as before, but also on how the region is situated on the board.

Probability distributions such as those of the dart-board example are said to be continuous, while distributions that arise from an assignment of probabilities to the individual outcomes are called discrete. Hybrid distributions--part discrete, part continuous--are possible, and in purely mathematical probability theory, no great distinction is drawn between the discrete and the continuous. In practice, however, it seems that the probability distributions actually encountered are either discrete or continuous, and practical methods are somewhat different in the different cases.

One of the defining characteristics of continuous probability distributions (some would say the only defining characteristic) seems paradoxical and should be carefully noted. When the dart is thrown, some point of the dartboard is hit; yet each point of the board, thought of as a degenerate region, is assigned probability zero by any continuous probability measure. That is, for each point on the board, the probability that it will be the one hit is zero; yet on each legal throw, some point is hit, even though whichever point is hit had zero probability of being hit, before the throw.

Applications

Questions of practical interest concerning particular probability spaces are generally as follows: What is the correct probability distribution over the set of outcomes, and what are the quantitative consequences if a particular probability distribution is assumed?

Cases in which the probability distribution is known exactly are almost always hypothetical, abstract, and ideal--for example, when the coin mentioned in the preceding section is perfectly fair and unbiased. In applications of probability theory to the real world, as in the classical applications of mathematics to physics, the abstract ideal is often taken to be a good approximation of noisy, dusty reality. Thus, for example, at reputable gambling establishments, one assumes that the cards are well shuffled and the roulette wheels and dice are perfectly fair and unbiased, and calculations are performed under these assumptions.

In the physical and social sciences, in order to obtain a probability distribution to work with in particular circumstances, researchers frequently play the following game, called probabilistic or statistical modeling. The probability distribution that is sought is assumed to be of a certain type; the assumption that the distribution fits a certain form, or falls under a certain type heading, is called the modeling assumption.

Within the type chosen, the different distributions of that type are distinguished, or indexed, by one or more parameters; these are approximated, for practical purposes, by experimental sampling techniques. For example, when the situation involves single measurements of individuals chosen at random from some population--for example, when one measures the volume, in cubic meters, of an asteroid, or the height, in centimeters, of a seven-year-old North American boy, or the weight, in grams, of a one-year-old Polish chicken--it is often assumed in practice that the numerical values of the measurements (thought of as outcomes of the experiment consisting of choosing the individual at random) are "normally" distributed. The normal distributions form a famous class of continuous probability distributions, known to many as "the bell-shaped curve," indexed by two parameters, the "mean" and the "standard deviation." (That is, if one knows that a certain distribution is normal and is told what its mean and standard deviation are, then, in principle, one knows what the distribution is and can answer any reasonable probabilistic question about it.) If it is assumed that the distribution of interest is normal, then the mean and the standard deviation are estimated by sampling. It is worth noting that the measurements mentioned above cannot possibly be exactly normally distributed because negative values are impossible. The assumption of normality is, therefore, known to be wrong, strictly speaking, in these and most other cases; it is remarkable that the normal model nevertheless provides such good and usable estimates in so many different situations.

Modeling assumptions are sometimes cavalier and sometimes not. Probabilistic modeling in physics may provide the best examples of applied mathematics; the reasoning that supports the modeling assumption can be shrewd and subtle; a probabilistic model can provide a mathematically simplified description of a physical situation that would not have been describable a century ago.

Here is a simple but instructive instance of probabilistic modeling. Take, for example, some radioactive material--a lump of plutonium mixed with some nonradioactive, inert material.

What percentage of the plutonium atoms will "go off," or decay, during a certain period of time--say six hours? If the percentage will be minimal, and if the plutonium atoms are spread out, or shielded from one another, so that the decay of one is highly unlikely to disturb its neighbors, then there are good reasons to assume that the number of atoms to decay in the six hours will be distributed according to one of the Poisson distributions. The Poisson distributions form a one-parameter family of discrete probability distributions; the value of the single parameter can be estimated in this case by doing experiments with the plutonium. The reasoning by which the Poisson distributions are chosen to model this sort of radioactive decay is beyond the scope of this article, but it is demonstrable that some sort of reasoning from the physical assumptions is necessary because when those assumptions no longer hold--for example, when the radioactive atoms are packed together sufficiently to start a chain reaction--the modeling assumption of the nice, civilized Poisson distribution can fail rather spectacularly.

In another instructive misapplication of the Poisson distributions, one can "show" that the probability that the sun will not rise tomorrow is a little greater than one-third. The point of this absurdity is that probabilistic modeling can be a tricky business.

Probabilistic models have invaded physics through the theories of ordinary and quantum statistical mechanics, and Brownian motion. In all cases, the state of the system (consisting of one particle or many particles), when the state can be given by the positions and momenta of the constituent particles, is considered to vary probabilistically over a number of possible states. This view contrasts with the deterministic view, according to which one could know the state exactly at each instant, if only one had enough information. The deterministic view is not exactly wrong; it is just that the probabilistic view seems to lead further, given the impossibility of getting enough information or of computing the exact state from a sufficiency of information, even if it existed.

Context

Correspondence between the great seventeenth century French mathematicians Pierre de Fermat and Blaise Pascal shows that they were interested in probabilistic questions of the elementary puzzle variety. The seeds of a serious interest in probability were sown, also in the seventeenth century, by the rise of the insurance business in England.

These seventeenth century occurrences bear approximately the same relation to the modern theory of probability that the ancient Sumerian and Egyptian interest in right triangles and builders' methods of geometric construction, measurement, and estimation bear to Euclidean geometry. As in the case of geometry, practical methods ran ahead of theoretical understanding in the early days, but once the theoretical principles began to be understood, the subject deepened to an extent that would have rendered it scarcely recognizable to the naive pioneers in the field.

As late as the eighteenth century, basic principles were badly enough understood that there could be an outright disagreement between two major mathematicians, Pierre-Simon LaPlace and Jean Le Rond d'Alembert, on the answer to the following question: What is the probability that heads will come up twice in two flips of a fair coin? LaPlace thought the probability was one in four; d'Alembert, one in three. Both had plausible justifications for their answer. LaPlace's answer was correct, but it is debatable whether his justification is more convincing than d'Alembert's. LaPlace gave the formal study of probability a huge boost, however, by treating it as a mathematical science as worthy of attention as celestial mechanics, the most glamorous mathematical discipline of his era. While his most famous work was a multivolume treatise on celestial mechanics, his second most famous work was Theorie analytique des probabilites (1812; Analytic theory of probability), evidently the first true attempt at a systematic treatment of probability.

Despite LaPlace's achievements, not much progress was made during the nineteenth century in the struggle with the fundamentals of probability. The questions about probability with which the American logician and philosopher Charles Sanders Peirce wrestled toward the end of that century were not significantly different from or more advanced than those dealt with by LaPlace at the beginning of the century.

Yet, statistical mechanics made a start in the nineteenth century, with the work of Ludwig Boltzmann and others on the kinetic theory of gases, despite the lack of a probabilistic foundation. It has not been at all unusual since then for mathematical structures to be erected on foundations that are mythical, nonexistent, or awaiting discovery, and the kinetic theory provides an interesting example of the phenomenon.

In this case, the foundation was not long awaited. The great Emile Borel, and his student Henri Lebesgue, in work extending from the end of the nineteenth century through the first ten years or so of the twentieth, placed probability squarely inside a more general "measure theory." Measure theory provides the point of view and the basic machinery necessary for formulating and attacking problems in statistical mechanics. All the different incarnations of probability theory, from dice games to insurance problems to statistical quantum mechanics, are unified under measure theory. There appear to be no cracks in this foundation; what remains is to work out the consequences of the probabilistic point of view.

Principal terms

ACTUARIAL SCIENCE: the methods of calculation of insurance-related probabilities from statistics, and the subsequent determination of fair insurance premiums

EXHAUSTIVE: covering all possibilities

MUTUALLY EXCLUSIVE: never occurring together

NORMAL DISTRIBUTION: a famous kind of continuous probability distribution, popularly described as a "bell-shaped curve"

PROBABILITY DISTRIBUTION: distribution of relative likelihoods over all possible occurrences

PROBABILITY SPACE: a parsing of possibilities into an exhaustive list of mutually exclusive occurrences; also called "sample space" or "set of (possible) outcomes"

RANDOM, RANDOMLY, AT RANDOM: without bias

Essay by Peter D. Johnson Jr.

Bibliography

Baldi, Paolo. Probability: An Introduction through Theory and Exercises. Springer, 2023.

Brush, Steven G. The Kind of Motion We Call Heat: A History of the Kinetic Theory of Gases in the Nineteenth Century. North-Holland, 1976.

DeFinetti, Bruno. Theory of Probability: A Critical Introductory Treatment. Translated by Antonio Machiard Adrian Smith. Vol. 1. Wiley, 1974.

Feller, William. An Introduction to Probability Theory and Its Applications. 2nd ed. Wiley, 1957.

Grimmett, G. R., and D. J. A. Welsh. Probability: An Introduction. 2nd ed. Oxford University Press, 2014.

Kac, Mark. Statistical Independence in Probability Analysis and Number Theory. Mathematical Association of America, 1959.

Newman, James R. The World of Mathematics. 3 vols. Simon & Schuster, 1956.

Rowntree, Derek. Probability without Tears. Charles Scribner's Sons, 1984.

Probability

Related Topics

On this Page

Subject Terms

Probability

Overview

Applications

Context

Principal terms

Bibliography