Distribution (mathematics)
In mathematics, a distribution refers to a function that describes the probability of different outcomes in a random variable. For continuous random variables, this is represented by a probability density function, where the area under the curve corresponds to the likelihood of the variable falling within a specific interval. Common examples of continuous distributions include the normal distribution, exponential distribution, and t-distribution. For discrete random variables, probability distributions provide the probabilities associated with each possible value, with the Poisson and binomial distributions being notable examples.
Mathematicians often rely on assumptions about these distributions to conduct analyses, utilizing parametric methods that test hypotheses based on parameters—numerical characteristics that define the population or model. The normal (or Gaussian) distribution is particularly significant due to its properties, such as symmetry, where the mean, median, and mode are the same, and the fact that certain ranges (like ±2 standard deviations from the mean) encompass a large percentage of the data. Overall, understanding distributions is crucial for statistical analysis and data interpretation in various fields.
Distribution (mathematics)
A probability distribution is a mathematical formula that gives a curve for a continuous random variable. The area under the curve gives the probability that the variable is found within a particular interval. Some distributions are the normal distribution, exponential distribution, and the t-distribution. Alternatively, in the case of discrete random variables, the formula gives the probability of each value of the variable. Examples of these include the Poisson distribution and the pinomial distribution.
Overview
First, the mathematician must assume that the empirical distribution of the data approximates accurately to the theoretical distribution derived from the theoretical, mathematical distribution defined by one or more parameters. Such methods are called parametric methods. A parameter is a numerical characteristic of a population or model, as in a "death" in a binomial distribution. Parametric methods test hypotheses about parameters in a population described by a particular distribution, for example, students’ t-test.
The Normal Distribution
When analysing data there is a choice between methods that make distributional assumptions, as above, and those that make no assumptions (called distribution-free or non-parametric methods). For example, say that the random continuous variable of interest is height. Say that the mean and standard deviation of the height of adult men are known. Assuming that the distribution of height in the population is the same as a specific probability distribution (here, the normal distribution), then the probability of adult males being more than six feet tall can be calculated.
Moreover, if it is known from observation that the proportion of babies being female is 0.58, then it is possible to work out (using the normal distribution) the probability of a woman with three children having three sons. The mean and standard deviation in the first example, and the value of 0.58 in the second example, are all examples of parameters. All probability distributions are described by one or more parameters. Regarding continuous variables, the normal distribution stands out as the fundamental parametric method of choice.
The Gaussian Distribution
A Gaussian distribution has the following properties:
Property 1: The shape is symmetric like a bell.
Property 2: The mean, median, and mode coincide.
Property 3: The limits from (mean – 2SD) to (mean + 2SD) cover the measurements of nearly 95% of subjects. These are referred to as ±2SD limits or sometimes as 2-sigma limits.
Another often-cited property of a Gaussian distribution is that the limits from (mean – 3SD) to (mean + 3SD) cover almost all (99.7%) of the subjects. These 3-sigma limits are rarely used in health and medicine. An exceptional use of these limits is in Z-scores. A Z-score can be calculated as (height – mean)/SD where mean and SD are calculated for reference healthy individuals of a given age or weight. So the Normal distribution can be seen to be of fundamental importance.
Bibliography
Blitzstein, Joseph K., and Jessica Hwang. Introduction to Probability. Boca Raton, FL: Chapman, 2015.
Forbes, Catherine, et al. Statistical Distributions. Hoboken, NJ: Wiley, 2011.
Indrayan, Abhaya. Medical Biostatistics. Boca Raton, FL: Chapman, 2013.