Probability and Statistics

Summary

Probability and statistics are two related fields covering the science of collecting, measuring, and analyzing information in the form of numbers. Both probability and statistics are branches of applied mathematics. Probability focuses on using numeric data to predict a future outcome. Statistics incorporates theory into the gathering of numerical data and the drawing of accurate conclusions. Because nearly all fields in applied science rely on the analysis of numbers in some way, probability and statistics are some of the most diverse areas in terms of subjects and career paths. Statisticians also practice in areas of the academic world outside of science and throughout industry.

Definition and Basic Principles

Probability and statistics are interconnected fields in applied mathematics. In both fields, principles of scientific theory are applied to the analysis of groups of data in the form of numbers. The main objective of probability and statistics is to ask and answer questions about data with as much accuracy as possible.

89250560-78392.jpg

Defining probability can be a challenge, as multiple schools of thought exist. In one view, held by a group of scholars known as frequentists, probability is defined as the likelihood that a statement about a set of data will be true in the long run. Frequentists focus on the big picture, specifically on the collective outcome of multiple experiments conducted over time, rather than specific data items or outcomes. In contrast, scholars known as Bayesians prefer to start with a probability-based assumption about a data set, and then test to see how close the actual data come to the initial assumption. On both sides of the debate, probabilists seek to understand patterns in data to predict how a population might behave in the future.

Statistics is a field with a broader scope than probability, but in some ways, it is easier to define. The academic discipline of statistics is based on the study of groups of numbers in three stagescollection, measurement, and analysis. At the collection stage, statistics involves issues such as the design of experiments and surveys. Statisticians must answer questions such as whether to examine an entire population or to work from a sample. Once the data are collected, statisticians must determine the level of measurement to be used and the types of questions that can be answered with validity based on the numbers.

No matter how rigorous an individual study might be, statistical findings are often met with doubt by scholars and the general public. A quote repeated often (and mistakenly attributed to former British prime minister Benjamin Disraeli) is “There are three kinds of lies: lies, damned lies, and statistics.”

Background and History

Probability has been a subject of interest since dice and card games were first played for money. Gambling inspired the first scholarly discussions of probability in the sixteenth and early seventeenth centuries. The Italian mathematician Gerolamo Cardano wrote Libellus de ratiociniis in ludo aleae (The Value of All Chances in Games of Fortune, 1714) in about 1565, although the work was not published until 1663. In the mid-1600s, French mathematicians Blaise Pascal and Pierre de Fermat discussed principles of probability in a series of letters about the gambling habits of a mutual friend.

The earliest history of a statistical study is less clear, but it is generally thought to involve demographics. British scholar John Graunt studied causes of mortality among residents of London and published his findings in 1662. Graunt found that statistical data could be biased by social factors, such as relatives' reluctance to report deaths due to syphilis. In 1710, John Arbuthnot analyzed the male-female ratio of babies born in Britain since 1629. His findings—that there were more males than females—were used to support his argument in favor of the existence of a divine being.

A third branch of statistics, the design of experiments and the problem of observational error, has its roots in the eighteenth-century work of German astronomer Tobias Mayer.

However, a paper by British theologian Thomas Bayes published in 1764 after his death is considered a turning point in the history of probability and statistics. Bayes dealt with the question of how much confidence could be placed in the predictions of a mathematical model based on probability. The convergence between probability and statistics has increased over time. The development of modern computers has led to major advances in both fields.

How It Works

In terms of scope, probability and statistics are some of the widest, most diverse fields in applied mathematics. If a research project involves items that must be counted or measured in some way, statistics will be part of the analysis. It is common to associate statistics with research in the sciences, but an art historian tracking changes in the use of color in painting from one century to another is just as likely to use a form of statistical analysis as a biologist working in a laboratory. Similarly, probability is used by anyone relying on numbers to make an educated guess about events in the future.

The word “statistic” can refer to almost any number connected to data. When statistics are discussed as a discipline, though, there is a multistep process that most projects followdefinition and design, collection, description and measurement, and analysis and interpretation.

Definition and Design. Much of the scholarship in the field of statistics focuses on data definition and the design of surveys and experiments. Statistical projects must begin with a question, such as “Does grade-point average in high school have an effect on income level after graduation?” In the data definition phase, the statistician chooses the items to be studied and measured, such as grades and annual earnings. The next step is to define other factors, such as the number of people to be studied, the areas in which they live, the years in which they graduated from high school, and the number of years in which income will be tracked. Good experimental design ensures that the rest of the project will gather enough data, and the right data, to answer the question.

Collection. Once the data factors have been defined, statisticians must collect them from the population being studied. Experimental design also plays a role in this step. Statistical data collection must be thorough and must follow the rules of the study. For example, if a survey is mailed to one thousand high school graduates and only three respond, more data must be collected before the survey's findings can be considered valid. Statisticians also must ensure that collected data are accurate by finding a way to check the reliability of answers.

Description and Measurement. Collected data must be stored, arranged, and presented in a way that can be used by statisticians to form statistics and draw conclusions. Grade-point averages, for example, might be compared easily if all the survey participants attended the same school. If different schools or grading systems were used, the statistician must develop rules about how to convert the averages into a form that would allow them to be compared. Once these conversions are made, the statistician would decide whether to present the data in a table, chart, or other form.

Analysis and Interpretation. In terms of statistical theory, the most complex step is the analysis and interpretation of data. When data have been collected, described, and measured, conclusions can be drawn in several ways—none of which is right in every case. In this step, statisticians must ask themselves a few questionsWhat is the relationship between the variables? Does a change in one automatically lead to a change in the other? Is there a third variable, or a lurking variable, not covered in the study that makes both data points change at the same time? Is further research needed?

One of the most common methods used for statistical analysis is known as modeling. A model allows the statistician to build a mathematical form, such as a formula, based on ideas. The data collected by the study can then be compared with the model. The results of the comparison tell a story that supports the study's conclusions. Some models are so innovative that they have earned their creators awards such as the Nobel Prize. However, even the best models can have flaws or can fail to explain actual data. Statistician George Box once said, “All models are wrong, but some models are useful.”

Prediction. Probability deals with the application of statistical methods in a way that predicts future behavior. The goal of many statistical studies is to establish rules that can be used to make decisions. In the example, the study might find that students who achieve a grade-point average of 3 or higher earn twice as much, as a group, as their fellow students whose averages were 2.9 or lower. As an academic discipline, probability offers several tools, based on theory, that allow a statistician to ask questions such as: How likely is a student with a grade-point average of 3.5 to earn more than $40,000 per year?

For both statistics and probability, one of the primary objectives is to ask mathematical questions about a smaller population to find answers that apply to a larger population.

Applications and Products

It would be nearly impossible to find a product or service that did not rely on probability or statistics in some way. In the case of a cup of coffee, for example, agricultural statistics guided where the coffee beans were grown and when they were harvested. Industrial statistics controlled the process by which the beans were roasted, packaged, and shipped. Statistics even influenced the strength to which the coffee was brewed—whether in a restaurant, coffeehouse, or a home kitchen. Probability played a role in each step as well. Forecasts in weather and crop yields, pricing on coffee bean futures contracts, and anticipated caffeine levels each affected a single brewed cup.

One way to understand the applications and products of probability and statistics is to look at some general categories by function. These categories cover professional fields that draw some of the highest concentrations of professionals with a statistical background.

Process Automation. One of the broadest and most common ways in which statistical methods are applied is in process automation. Quality control is a leading example. When statistical methods are applied to quality control, measures of quality such as how closely products match manufacturing specifications can be translated into numbers and be tracked. This approach allows manufacturers to ensure that their products meet quality standards such as durability and reliability. It also verifies that products meet physical standards, including size and weight. Quality control can be used to evaluate service providers as well as manufacturers. If measures of quality such as customer satisfaction can be converted into numerical data, statistical methods can be applied to measure and increase it. One well-known quality control application, Six Sigma, was developed by the Motorola Corporation in the early 1980s to reduce product defects. Six Sigma relies on probability and statistical processes to meet specific quality targets.

Another field in which process automation is supported by statistical analysis is transport and logistics. Transport makes up a significant amount of the cost of manufacturing a product. It also plays a major role in reliability and customer satisfaction. A manufacturer must ensure that its products will make the journey from one place to another in a timely way without being lost or damaged. To keep costs down, the method of transport must be as inexpensive as possible. Statistical methods allow manufacturers to calculate and choose the best transportation options, such as truck versus rail, and packaging options, such as cardboard versus plastic packaging, for their products. When fuel costs rise, the optimization of logistics becomes especially important for manufacturers. Probability gives manufacturers tools such as futures and options on fuel costs.

In sports, statistics help coaches or managers in decision-making by using data from the past. It helps determine wins and losses, rankings, game outcome predictions, and selling sports tickets. Probability, if used tactically, can help in finding the likelihood of the occurrence of an event and in making further decisions.

Biostatistics. Statistics are used in the biological sciences in a variety of ways. Epidemiology, or the study of disease within a population, uses statistical techniques to measure public health problems. Statistics allow epidemiologists to measure and document the presence of a specific illness or condition in a population and to see how its concentration changes over time. With this information, epidemiologists can use probability to predict the future behavior of the health problem and recommend possible solutions.

Other fields in the biological sciences that rely on statistics include genetics and pharmaceutical research. Statistical analysis has played a key role in the Human Genome Project. The amount of data generated by mapping human genes could be analyzed only with complex statistical processes, some of which are still being fine-tuned to meet the project's unique needs. Probability analyses allow geneticists to predict the influence of a gene on a trait in a living organism.

Pharmaceutical researchers use statistics to build clinical trials of new drugs and to analyze their effects. The use of statistical processes has become so widespread in pharmaceutical research that it is extremely difficult to obtain approvals from the US Food and Drug Administration (FDA) without it. The FDA publishes extensive documentation guiding researchers through the process of complying with the agency's statistical standards for clinical trials. These standards set restrictions on trial factors ranging from the definition of a control group (the group against which the drug's effects are to be measured) to whether the drug effectively treats the targeted condition.

Spatial Statistics. Understanding areas of space requires the analysis of large amounts of data. Spatial statistics are used in fields such as climatology, agricultural science, and geology. Statistical methods provide climatologists with specialized tools to model the effects of factors such as changes in air temperature or atmospheric pollution. Meteorology also depends on statistical analysis because its data are time-based and often involve documenting repeated events over time, such as daily precipitation over several years. One of the best-known applications of probability to a field of science is weather forecasting.

In agricultural science, researchers use statistics and probability to evaluate crop yields and to predict success rates in future seasons. Statistics are also used to measure environmental impact, such as the depletion of nutrients from the soil after a certain kind of crop is grown. These findings guide recommendations, based on probabilistic techniques, about what kinds of crops to grow in seasons to come. Animal science relies on statistical analysis at every stage, from genetic decisions about the breeding of livestock to the environmental and health impacts of raising animals under certain kinds of farming conditions.

Geologists use statistics and probability in a wide range of ways. One way that draws significant interest and funding from industry is the discovery of natural resources within the Earth. These efforts include mining metals and the extraction and refining of products such as oil and gasoline. Mining and petroleum operations are leading employers of statisticians and probabilists. Statistical processes are critical in finding new geologic sites and measuring the amount and quality of materials to be extracted. To be done profitably, mining and petroleum extraction are functions that must be carried out on a large scale, assisted by sizable amounts of capital and specialized equipment. These functions would not be possible without sophisticated statistical analysis. Statistics and probability are also used to measure environmental impact and to craft effective responses to disasters such as oil spills.

Risk Assessment. As a science, statistics and probability have their roots in risk assessment—specifically, the risk of losing money while gambling. Risk assessment remains one of the areas in which statistical analysis plays a chief role. A field in which statistical analysis and risk assessment are combined at an advanced level is security. Strategists are beginning to apply the tools of statistics and game theory to understanding problems such as terrorism. Although terrorism was once regarded as an area for study in the social sciences, people are developing ways to control and respond to terrorist events based on statistics. Probability helps strategists take lessons from terrorism in one political context or world region and apply them to another situation that, on the surface, might look very different.

Actuarial science is one of the largest and most thoroughly developed fields within risk assessment. In actuarial science, statistics and probability help insurance and financial companies answer questions such as how to price an automobile insurance policy. Actuaries look at data such as birth rates, mortality, marriage, employment, income, and loss rates. They use this data to guide insurance companies and other providers of financial products in setting product prices and capital reserves.

Quantitative finance uses statistical models to predict the financial returns, or gains, of certain types of securities such as stocks and bonds. These models can help build more complex types of securities known as derivatives. Although derivatives are often not well understood outside of the finance industry, they have a powerful effect on the economy and influence everyday situations like a bank's ability to loan money to a homebuyer.

Survey Design and Execution. Surveys use information gathered from populations to make statements about the population as a whole. Statistical methods ensure surveys gather enough high-quality data to support accurate conclusions. A survey of an entire population is known as a census. One prominent example is the United States Census, conducted every ten years to count the country's population and gather basic information about each person. Other surveys use data from selected individuals through a process known as sampling. In nearly all surveys, participants receive questions and provide information in the form of answers, which are turned into mathematical data and analyzed statistically.

Aside from government censuses, some of the most common applications of survey design are customer relationship management (CRM) and consumer product development. By gathering survey data from customers, companies can use customer relationship management to increase the effectiveness of their services and identify frequent problems to be fixed. In creating and introducing new products, most companies rely on data gathered from consumers through Internet and telephone surveys and on the results of focus groups.

Careers and Course Work

The U.S. Bureau of Labor Statistics estimates that about 30 percent of the professionals who specialize in a branch of statistics work for federal, state, and local government agencies, with the rest in private industry and academic research. Of the statisticians employed by the federal government, the highest numbers were found in the Departments of Commerce, Agriculture, and Health and Human Services. Many professionals outside the formal probability and statistics field may have a statistical component to their work, including analysts, engineers, biologists, agricultural scientists, geologists, or bankers.

Statisticians and probabilists are likely to hold master's or doctoral degrees. Many hold a second degree in their fields of specialty. College-level coursework for a statistician focuses on classes in mathematics, particularly calculus and differential equations. Statistics majors take probability theory, statistical methods, mathematical modeling, experimental and survey design, and multivariate data analysis courses. Much of this coursework depends on a skilled use of statistical computer software. Additional computer programming or computer science classes are useful to many students in statistics and probability. The American Statistical Association (ASA) provides research and statistical practice to aspirants.

To become an actuary in the United States, most professionals attend a four-year college and obtain a bachelor's degree in actuarial science from one of more than one hundred programs available. Coursework in an actuarial science program includes mathematics (particularly calculus), probability, statistics, economics, and finance. An internship in the field while in school is encouraged and increases the chances of securing a job after graduation. Before practicing, an actuary must pass a series of licensing exams offered jointly by the two leading professional groups, the Society of Actuaries and the Casualty Actuarial Society. The initial exams can be taken before completion of the degree. After pursuing courses, one can work as a sports statistician, biometrician, or operations research analyst and in the Department of Defense.

Social Context and Future Prospects

There is an increasing need for professionals with a knowledge of probability and statistics. The growth of the Internet has led to a rapid rise in the amount of information available, both in professional and private contexts. This growth has created new opportunities for the sharing of knowledge. However, much of the information being shared has not been filtered or evaluated. Statistics, in particular, can be fabricated easily and often are disseminated without a full understanding of the context in which they were generated.

Statisticians are needed to design experiments and studies, collect information based on sound research principles, and responsibly analyze and interpret their work. Aside from a strong knowledge of this process, statisticians must be effective communicators. They must consider how their findings might be used and shared, especially by people without a mathematical background. Their results must be presented in a way that will ensure a clear understanding of purpose and scope.

Although overall job growth for statisticians is predicted to remain steady, fields likely to see a higher rate of growth are those involving statistical modeling. An increase in computer software and other tools to support modeling has fueled a higher demand for professionals with familiarity in this area. Statistical modeling is useful in many contexts related to probability. These contexts range from analyzing clinical trial data for pharmaceutical products to forecasting monetary gains and losses in a portfolio of complex financial instruments. Career growth prospects, particularly at the entry level, are most attractive for candidates with a background in statistics and their applied fields.

The number of jobs for actuaries is expected to grow more rapidly than for statisticians in general. However, the popularity of actuarial science as an area of study for college undergraduates may produce a surplus of qualified applicants. Insurers are expected to be the primary employers of actuaries for the foreseeable future, but demand for actuaries is growing among consulting firms. As the tools for modeling the impact of large-scale disasters become more sophisticated, companies in many industries will seek expertise in protecting themselves against risk—a trend that may heighten the need for actuarial expertise. During the COVID-19 pandemic, probability and statistics helped policymakers analyze data such as the number of coronavirus cases and vaccine distribution.

Bibliography

Abderrahmane, Horimek. Probability and Statistics. Bentham Science Publishers, 2024.

Arenschield, Laura. "COVID-19 Brings Calculus, Stats, Probability Theory into Our Daily Lives." The Business Journal, Youngstown Publishing, 16 May 2020, businessjournaldaily.com/covid-19-brings-calculus-stats-probability-theory-into-our-daily-lives. Accessed 8 June 2021.

Black, Ken. Business Statistics: For Contemporary Decision Making. 11th ed., Wiley, 2023.

Boslaugh, Sarah. Statistics in a Nutshell. 2nd ed., O'Reilly, 2013.

DeGroot, Morris H., and Mark J. Schervish. Probability and Statistics. 4th ed., Pearson Education, Inc., 2019.

Fung, Kaiser. Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do. McGraw, 2010.

Mlodinow, Leonard. The Drunkard's Walk: How Randomness Rules Our Lives. Pantheon, 2008.

Nisbet, Robert, John Elder, and Gary Miner. Handbook of Statistical Analysis and Data Mining Applications. 2nd ed., Elsevier, 2018.

Ott, R. Lyman, and Michael T. Longnecker. An Introduction to Statistical Methods and Data Analysis. 7th ed., Brooks, 2016.

"Probability in Sports." Lifestyle, 2021, lifestyle.iresearchnet.com/psychology/sports-psychology/probability-in-sports. Accessed 8 June 2021.

Sharpe, Norean R., Richard De Veaux, and Paul F. Velleman. Business Statistics: A First Course. 3rd ed., Pearson, 2017.

Takahashi, Shin. The Manga Guide to Statistics. 4th ed., No Starch, 2012.