Probability and Statistics (Business)

This article explores the importance and use of probability and statistics within a business. Companies often have to cope with a degree of uncertainty and risk in their decision making. The collection and analysis of data is a key factor in easing ambiguous circumstances. Data is effectively analyzed using probabilistic and statistical models to evaluate past performance, understand the wants and needs of consumers, and draw important conclusions and relationships from the data. It is crucial for a business to understand the differences between statistical tools and to know which statistics will suit their needs best. The use of probability and statistics are useful tools in business forecasting as well.

Keywords Business Statistics; Correlation; Economic Forecasting; Expectation; Forecasting; Median; Mode; Outlier; Probabilistic Model; Probability; Qualitative Data; Quality Control; Quantitative Data; Regression; Risk; Statistical Model; Statistical Sampling; Statistics; Variance

Statistics > Probability & Statistics

Overview

Probability & Statistics in Business

Probability and statistics are essential concepts that arise in everyday situations, as well as in the business world. Business statistics is primarily concerned with drawing inferences about a particular population. The use of statistics and probability within the realm of business allows companies to make decisions while working in unpredictable circumstances. Statistics allow businesses to analyze various components of their company. Customer satisfaction, expectations, quality, and comparisons are all crucial aspects that need to be examined within a business. The application of statistics to business is what makes the exploration of these four areas possible.

Probability

The concept of probability has been in use since the 16th century, and later branched out into its own mathematical field in the 17th century. Probability was often studied in gambling and other such games of chance that featured an aspect of ambiguity. Probability is currently used in a wide array of fields, including financial sectors such as insurance and investments, as well as in technological and medical industries.

Statistics

The idea of statistical data arose in the 17th century through studies and collections of data involving population and the human life cycle. The formulation of the field of statistics was driven by probability and games associated with chance or risk, as well as to numerically quantify and describe populations. Currently, the area of statistics is concerned primarily with making inferences and drawing conclusions based upon a set of data. The study of statistics has infiltrated various other fields such as economics, business, finance, and medicine.

Risk

The use of probability and statistics is associated with the concept of risk, an abstract measurement of unpredictability in a result. It supplies a ratio of how much an outcome will vary from what is expected during a given time interval. The concept of risk is often seen in financial sectors. The use and assessment of risk is frequently used by particular professions. One prime example is that of an actuary. Actuaries can work in a multitude of places but are commonly employed by insurance companies and consulting firms. Actuaries, as well as other risk managers, analyze risk on a daily basis using several statistical measurements.

Risk often presents itself in a business under numerous circumstances. According to Kallman (2005), the three main components in measuring risk are expectation, variance and time. The expected outcome, or mean, enables risk managers to use previous outcomes as a model for future results. The mean is calculated by dividing the summation of all outcomes by the total number of possible outcomes. Expectation can be a core part of decision-making in business. Companies can calculate expected profits and adjust their inventory accordingly in order to optimize their potential profit. Another useful measurement is variance, which is used to calculate a range for the possible values of an outcome. The square root of the variance results in a value known as the standard deviation. The length of time during which it is probable that a possible loss of capital can occur is also an important factor in measuring risk.

There can often be a degree of uncertainty or risk present within a business. The future is of primary concern for companies and it is also an entire realm of ambiguity. Businesses cannot change past performance, but they can attempt to predict future economic conditions and prepare their companies accordingly. Such strategies are known as economic forecasting. Forecasting often involves the analysis and thorough examination of specific statistics.

Important Business Statistics

Consumer Price Index (CPI) is a key measurement of the economy's current condition, a representation of the cost of living and how this cost changes over time. This statistic can assist businesses in better understanding their customers by predicting which products and what quantity of products they will consume. It is a good indicator of inflation within an economy. The Producer Price Index and prices paid by farmers are also used as economic indicators (Economic Indicators, 2013).

When a company provides a service, Key Performance Indicators (KPIs), are critical statistics that allow the business to evaluate its performance and identify any potential problem areas. The most important KPI is billable hour efficiency. This statistic enables a company to measure its profit by looking at the amount of hours sold, which directly contributes to the inflow of cash within a business.

It is essential that businesses and their employees have a good understanding and firm grasp of the use of probability and statistics. A critical aspect of running a successful business is consumer satisfaction and optimal performance. In order to achieve these goals, probability and statistics can aid decision makers in times of uncertainty by choosing a path in their best interest. It is important to be able to distinguish between various statistics and be able to select a method that will give a valuable result.

Application

Data Collection & Analysis

The collection and analysis of data is a vital factor in probability and statistics. There are numerous ways to collect and categorize data. As mentioned, data from government indexes is a longstanding source. New techniques have emerged as well. Premise, for example, is “a US start-up that is challenging traditional economic data models by generating real-time information. The … company does this by gleaning price data from global ecommerce sites and by crowdsourcing data from people using Android phones in retail locations around the world. It can then provide real-time updates on price and inflation changes, allowing corporations and governments to react more quickly” (Bacon, 2013).

Data can be divided into two distinct parts, qualitative and quantitative. Qualitative data have no numerical value associated with them and can be broken down into descriptive categories. Quantitative data can be represented numerically and relationships can be drawn between the data and other values or measures. These types of data are collected by either counting or measuring, and are referred to as discrete data or continuous data, respectively. Tables, charts, and graphs are all used to display and study data. Histograms, box plots, stem-and-leaf plots, scatter plot, distributions and other representations are widely used.

  • Histograms are common bar graphs used to display frequency distributions, or the rate at which a particular factor occurs. The height of each bar corresponds to the number of times an outcome is observed for a variable.
  • Box plots graphically display percentiles. The first, second, and third quartiles, or the 25th, 50th, and 75th percentiles respectively, are represented in the plot. A line through the box representing the second quartile corresponds to the median of the data. Box plots are useful in identifying outliers.
  • Stem-and-leaf plots are numerical representations of the data that divide the values into stems and leaves. Stems and leaves are separated by a vertical line. To the left of the line is the stem, or beginning of a value, and to the right are the leaves or endings of all values that have the same stem. These types of graphs are similar to histograms. The following stem-and-leaf display corresponds to data values of 40, 43, 47, and 49:4 | 0 3 7 9.
  • Scatter plots are graphs in which each data value is represented by a point. These types of graphs are useful for examining linear relationships between variables and regression analysis.

Inaccuracies in Data

When collecting and analyzing data, there can often be a result that differs greatly from all other observations. Outliers are incoherent outcomes and can be caused merely by error or by some other factor that needs further consideration. It is important to perform a full analysis of the potential causes of inconsistent data. Sometimes retesting is necessary. Data should be presented with and without outliers.

Certain statistics can be used in order to reduce the impact of obscure data values. The size of the data set greatly influences which method would be best in order to identify outliers. The mean value can be calculated with extreme values on both ends of the spectrum discarded. This is known as a trimmed mean. Original mean values should always be readily available along with the trimmed mean. The method of least squares is the procedure that determines the line that best fits a given set of data points. If the data are normally distributed, outliers can be identified by using a method called extreme studentized deviate. The maximum deviation from the mean can be calculated for every data member. A table with entries representing data values and their corresponding studentized deviations is used in comparison with the maximum deviation from the mean. If the maximum deviation from the mean is greater than the entry in the table, then the data value can be labeled as an outlier. This method works best for a sample size greater than ten. Box plots can also be used to identify outliers on sets of data containing more than 25 values. Box plots graphically represent the percentiles of a given set of data. The median is represented by the 50th percentile. The lower and upper ranges of the graph are the 25th percentile and 75th percentile, respectively. Any data that lies outside of this range can be considered an outlier.

Probabilistic & Statistical Modeling

Probabilistic and statistical models are based upon data that have been studied and analyzed. Probabilistic models allow business managers to effectively analyze past company performance, as well as future performance. These models are dependent upon statistical techniques. Many business decisions are made with a degree of uncertainty, and this is where the role of probability comes into play. Probability assists in managing uncertainty and predicting risk. Businesses do not make decisions based upon ambiguous and unknown information. This degree of doubt is replaced by a probability, or likelihood that an outcome will occur. A business estimating the chances of producing defective products or the likelihood of a consumer using a particular service offered by a business are both examples of applications of probability. Companies can use these probabilities to optimize business and attempt to create a company that operates at its fullest potential. Venn diagrams and probability tree diagrams are two models that can be used to help visualize these probabilities.

As an example, assume a company knows that the probability consumers purchased pencils during the previous back-to-school season was 0.75, and the probability they purchased erasers was 0.25. Based upon these probabilities, the company might choose to focus more on promoting the sale of pencils over the sale of erasers. If the probability customers bought erasers given that they also purchased a ruler was .50, then it might be in the interest of a business to have a promotion involving the purchase of rulers and erasers together. When the occurrence of an event is dependent on another event, it is known as conditional probability.

Statistical Modeling is used concurrently with regression, which represents the relationship between independent and dependent variables. This type of analysis in a statistical model allows businesses to examine cause-and-effect relationships between various factors and output.

Statistical Sampling for Consumer Understanding

An important aspect of any business is being able to understand the wants and needs of its consumers. A useful and straightforward method of obtaining this information can be achieved by statistical sampling. The data obtained from the sampling will ease the decision making process of a business. A sample is a subset selected from the larger population of interest. Samples should be nonbiased and represent the population as a whole. It is in the best interest of a company to select larger sample sizes. According to the Weak Law of Large Numbers, the sample mean will begin to coincide with the expected mean as the sample size increases. Businesses can extract data from these samples and use them to draw conclusions. A large sample will ensure that the inferences concluded from the data will be more accurate. Statistical tools such as mean, median, and mode can all be calculated for a set of data and used for analytical purposes. These measurements can be extremely valuable for businesses. They can be used for a multitude of applications such as computing an approximate time for completion of a project.

Z-Scores

A common and useful statistic is the z-score. If a sample set of data is normally distributed, z-scores can be calculated. A data set is normally distributed if it forms a bell-shaped, symmetric curve. A standard normal distribution has a mean equal to zero and a variance of one. The z-score for a particular value represents the number of standard deviations away from the mean. This statistic can be used to determine the probability that a value will be greater than or less than a particular value, or fall in between two specified values. The z-score is computed by subtracting the mean from the desired value and dividing it by the standard deviation. A standard normal distribution function lists a table with all possible z-scores and the associated probability that a value will fall below or above the specified number of standard deviations (z-score) away from the mean.

Quality Determination

Quality assurance, improvement, and control are all substantial parts of a business that ensure continual customer satisfaction. The quality of a product entails quantifying some property of a given item. Quality control is the assurance that all products and services offered by a particular company meet the standards of consumers. The applied methods in quality control are mainly taken from statistics (Sans, 2013); probability and statistics are significant determining factors of quality.

Samples with a high variance or standard deviation have low quality, and a minimal variance or standard deviation indicates higher quality. The sample mean is not a good indicator if a set of data has a high variance because numerous values are dispersed to the right and left of the mean. A set of data with a low variance indicates that most of the data values are located in the general vicinity of the mean. In this case, the sample mean can be used as a general representative of the sample. As an example, assume a company uses a survey to poll its customers on the satisfaction of a particular product on a scale from 1 to 5, where 5 is extremely satisfied and 1 is dissatisfied. If all consumers equally choose between 1 and 5, the mean of the data will be 3. The variability of this data set is high and indicates that the quality is low. The sample mean is not a good indication of the consumer's satisfaction with the product.

Issues

The interpretation and analysis of probabilistic and statistical data and models within businesses are vital in order to achieve the most effective and relevant way to use this information. The use of statistics should never be solely based on assumptions. All assumptions and knowledge derived from them should be justified by known facts.

It is important for businesses to first narrow down the specifics of what they are looking for within the data. Once this is formulated, the proper analytical tools and arithmetical measurements can be decided upon. These values should help draw conclusions about the matter at hand.

It is important to establish the difference between correlation and causality. Correlation represents the relationship between variables. While one variable increases, another variable may follow the same ascending pattern, or it may decrease. It is vital to understand that correlation does not necessarily imply a cause-and-effect relationship between these variables. There can often be a third, undisclosed variable that influences the direct or indirect relation between the other two variables, which should not be overlooked.

In order to draw conclusions and relationships between different variables, businesses first need to assemble data. Companies can gather data by performing research or distributing surveys to their consumers. Secondary sources can also be used to acquire data that have already been researched and published. When collecting information about a population, it is important that businesses use a random sample from within the specified population. This means that each unit in a population has equal probability of being selected. A sample should represent the population as a whole.

Terms & Concepts

Business Statistics: The use of statistics within a business to formulate inferences and make decisions in uncertain and unpredictable circumstances.

Correlation: The relationship between variables, which can be positive, negative or zero. If the correlation between two variables is positive, it indicates that both variables move in the same direction simultaneously. When the correlation is negative, variables move in opposite directions. A correlation equal to zero indicates that there is no relationship between variables.

Economic Forecasting: Predicting future economic conditions. Probability and statistics are important tools in forecasting, and they ease conditions of uncertainty and risk in decision making.

Expectation: The summation of all possible numerical outcomes multiplied by their respective probabilities. The average of all values within a given data set. Expectation can also be referred to as the mean.

Median: The number that lies directly in the middle of a set of ordered values or the average of the two middle numbers.

Mode: The numeric value that occurs most frequently within a given data set.

Outlier: An observation that varies greatly from all others.

Probabilistic Model: A model that allows business managers to effectively analyze past company performance, as well as predict future performance.

Qualitative Data: Data that cannot be quantified. It has no numerical value and can often be categorized descriptively.

Quality Control: The use of probability, statistics, and other tools used by a company to determine if their services and products meet the needs and expectations of their consumers. Quality control allows a business to effectively understand and analyze the wants and needs of customers.

Quantitative Data: Data that can be represented numerically and statistical measures that can be applied to all data values. Quantitative data can be separated into discrete data and continuous data. Discrete data can be counted and continuous data is measured.

Regression: The relationship between dependent and independent variables.

Risk: A tool used to measure unpredictability and uncertainty in a result or outcome. It is often represented numerically as a ratio corresponding to the amount an outcome can vary from what is expected.

Statistical Model: Models that allow businesses to analyze cause-and-effect relationships between various inputs and outputs. Statistical models are often used simultaneously with regression.

Variance: The distribution of data points in relation to its expectation. Variance calculates a range for which all data values belonging to a particular set fall in. The square root of the variance is known as the standard deviation, which is also a measure of variability within a set of a data.

Bibliography

Arsham, H. (1994). Statistical thinking for managerial decisions. Retrieved December 22, 2006, from http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm

Bacon, J. (2013). Faith in government data is slipping. Marketing Week (Online Edition), 5. Retrieved November 20, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=91583386&site=ehost-live

Hamilton, J. (2006). Knowing right statistics can save your profit. Contractor Magazine, 53, 58. Retrieved December 22, 2006, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=22927086&site=bsi-live

Johnson, R. A. (2005). Miller & Freund's probability and statistics for engineers (7th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.

Kallman, J. (2005). What is risk? Risk Management, 52, 57. Retrieved December 22, 2006, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=19058641&site=ehost-live

Prices. (2013). Economic Indicators , 22-25. Retrieved November 21, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=89512858&site=ehost-live

Sans, W. (2013). A critical review of statistical methods used in quality control. Economic Quality Control, 27, 97-142. Retrieved November 20, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=85862366&site=ehost-live

The use and misuse of statistics. (2006). Harvard Management Update, 11, 3 — 4. Retrieved December 22, 2006, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=19907329&site=ehost-live

Walfish, S. (2006). A review of statistical outlier methods. Pharmaceutical Technology, 30, 82 — 88. Retrieved December 22, 2006, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=23101446&site=ehost-live

Suggested Reading

Coaker, W.J. (2007). Emphasizing low-correlated assets: The volatility of correlation. Journal of Financial Planning, 20, 52-70. Retrieved October 22, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26475371&site=ehost-live

Maneva, E. (2007). A new look at survey propagation and its generalizations. Journal of the ACM, 54, 1-41. Retrieved October 22, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26227143&site=ehost-live

Rosenzweig, P. (2007). Misunderstanding the nature of company performance: The halo effect and other business delusions. California Management Review, 49, 6-20. Retrieved October 22, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25996994&site=ehost-live

Edited by Richa S. Tiwary, Ph.D., MLS

Dr. Richa S. Tiwary holds a Doctorate in Marketing Management with a specialization in Consumer Behavior from Banaras Hindu University, India. She earned her second Masters in Library Sciences with dual concentration in Information Science & Technology, and, Library Information Services, from the Department of Information Studies, University at Albany-SUNY.