Business Statistics
Business statistics involves applying mathematical statistical techniques to address real-world business challenges. This field is essential for organizing and analyzing data, enabling businesses to make informed decisions based on quantifiable evidence. Business statistics finds applications across various domains, including marketing, operations, quality control, and forecasting, supporting businesses in making comparisons between complex data sets. It employs both descriptive statistics, which summarize and visualize data for clarity, and inferential statistics, which allow conclusions to be drawn from sample data about a larger population.
A critical aspect of effective business statistics is the formulation of testable hypotheses. These hypotheses guide research designs that seek to control extraneous variables and accurately reflect real-world situations. Analysts use statistical methods, such as t-tests and ANOVA, to evaluate the effects of different variables and determine statistical significance, which indicates whether observed results are due to chance or represent genuine differences. Through these techniques, businesses can better understand customer preferences, predict market trends, and ultimately enhance their decision-making processes for improved outcomes.
Subject Terms
Business Statistics
Business statistics is the application of mathematical statistical techniques to the real problems of the business world. In addition to helping the business analyst to organize and describe data, business statistics allows meaningful comparisons to be made between and among complex sets of data. Business statistics can be applied to a wide range of business problems including marketing, operations, quality control, and forecasting. The usefulness of statistical analysis, however, depends on the quality of the hypothesis being tested. There are a number of considerations for developing a testable hypothesis that will yield meaningful results when analyzed and for designing a research study that will control extraneous variables while emulating the real world situation to which the results will be extrapolated.
Every day, business persons are faced with a multitude of questions the answers to which can determine not only the course but the very success of the business. Will the new logo design better connect our product in the customer's mind than the current logo does? Will the new widget design attract people's attention and make them want to buy it? Are men or women more likely to buy a gizmo, and how do we best advertise it to them? How do we turn prospective customers into established customers? Will oil prices continue to rise and will people be open to alternative fuel sources? If the business answers these questions correctly, it can be on the leading edge of its industry. However, if the business answers these questions incorrectly, it can potentially lose money, its market share, or even its viability.
Mathematical statistics is a branch of mathematics that deals with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data are analyzed to find answers to quantifiable questions. Business statistics is the application of these tools and techniques to the analysis of real world problems for the purpose of business decision making.
There are two general classes of statistics that are used by the business analyst. Descriptive statistics are used to describe and summarize data so that they can be more easily comprehended and studied. Among the tools of descriptive statistics are various graphing techniques, measures of central tendency, and measures of variability. Graphing techniques help the analyst aggregate and visually portray data so that they can be better understood. Included in this category are histograms, frequency distributions, and stem and leaf plots. Measures of central tendency estimate the midpoint of a distribution. These measures include the median (the number in the middle of the distribution), the mode (the number occurring most often in the distribution), and the mean (a mathematically derived measure in which the sum of all data in the distribution is divided by the number of data points in the distribution). Measures of variability summarize how widely dispersed the data are over the distribution. The range is the difference between the highest and lowest scores in the distribution. The standard deviation is a mathematically derived index of the degree to which scores differ from the mean of the distribution.
Descriptive statistics are helpful for taking large amounts of data and describing them in ways that are easily comprehendible. Pie charts, histograms, and frequency polygons are frequently used in business presentations and are all examples of ways that descriptive statistics can be used in business. Although such descriptive statistics are useful in summarizing and describing data, business statistics is an applied form of mathematics and is a valuable tool for helping analyze and interpret data. This can be done through the use of inferential statistics, a collection of techniques that allow one to make inferences about the data, including drawing conclusions about a population from a sample.
In general, inferential statistics are used to test hypotheses to determine if the results of a study occur at a rate that is unlikely to be due to chance (i.e., have statistical significance). A hypothesis is an empirically testable declarative statement that the independent and dependent variables and their corresponding measures are related to in a specific way as proposed by the theory. The independent variable is the variable that is being manipulated by the researcher. For example, a market researcher might be trying to determine which new breakfast cereal the organization should bring to market. The independent variable is the type of breakfast cereal. The dependent variable (so called because its value depends on which level of the independent variable the subject received) is the subject's response to the independent variable (e.g., whether or not the people like the breakfast cereal they are given to try). Examples of hypotheses include "the new red widget logo is better remembered than the old blue logo," "grade school children prefer the taste of new, improved Super Crunchies to original Crunchies cereal," or "Widget Corporation stores in the western states are more profitable than those in the East."
For purposes of statistical tests, the hypothesis is stated in two ways. The null hypothesis (H0) is the statement that there is no statistical difference between the status quo and the experimental condition. In other words, the treatment being studied made no difference on the end result. For example, a null hypothesis about the effectiveness of the two possible logos for Widget Corporation would be that there is no difference in the way that people react to the old logo versus the new logo. This null hypothesis states that there is no relationship between the variables of old/new logo (independent variable) and whether or not people like it (dependent variable). The alternative hypothesis (H1) states that there is a relationship between the two variables (e.g., people prefer the new logo).
Following the formulation of the null hypothesis, an experimental design is developed that allows the researcher to empirically test the hypothesis. Typically, this design will have a control group that that does not receive the experimental conditions (e.g., the group sees only the old logo) and an experimental group that does receive the experimental condition (e.g., the group sees the new logo). The analyst then collects data from people in the study to determine whether or not the experimental condition had any effect on the outcome. After the data have been collected, they are statistically analyzed to determine whether the null hypothesis should be accepted (i.e., there is no difference between the control and experimental groups) or rejected (i.e., there is a difference between the two groups). As shown in Figure 1, accepting the null hypothesis means that if the data in the population are normally distributed, the results are more than likely due to chance. This is illustrated in the figure as the unshaded portion of the distribution. By accepting the null hypothesis, the analyst is concluding that it is likely that people do not react any differently to the red logo than they do to the blue logo. For the null hypothesis to be rejected and the alternative hypothesis to be accepted, the results must lie in the shaded portion of the graph. This means that there is a statistical significance that the difference observed between the two groups is probably not due to chance but to a real underlying difference in people's attitudes toward the two logos.
Part of the process of designing an experiment is determining how the data will be analyzed. There are a number of different statistical methods for testing hypotheses, each of which is appropriate to a different type of experimental design. One class of statistical tests is the t-tests. This type of statistical technique is used to analyze the mean of a population or compare the means of two different populations. In other situations where one wishes to compare the means of two populations, a z statistic may be used.
Another frequently used technique for analyzing data in applied settings is analysis of variance (ANOVA). This family of techniques is used to analyze the joint and separate effects of multiple independent variables on a single dependent variable and to determine the statistical significance of the effect. For example, analysis of variance might be used if one wished to determine the relative profitability of an organization's operations in three different countries. Multivariate analysis of variance (MANOVA) is an extension of this set of techniques that allows the business analyst to test hypotheses on more complex problems involving the simultaneous effects of multiple independent variables on multiple dependent variables.
Other types of applied statistics allow the business analyst to predict one variable from the knowledge of another variable. If one were launching a new cereal in marketplace, it might be helpful to know the demographics of the people who prefer the cereal so that it could be introduced into the correct market. For example, if the new cereal appealed primarily to children but not to adults, it would not be a prudent strategy to put it in the grocery store in an all-adult community. One way to answer this question is by determining the relationship between the two variables (e.g., age of consumer and attitude toward the new cereal). Correlation coefficients allow analysts to determine whether the two variables are positively correlated (e.g., the older people become, the more they like the new cereal), negatively correlated (e.g., the older people become, the less they like the new cereal), or not correlated at all.
However, real world problems do not always have easy answers involving only two variables. For example, consumers' attitudes toward the new cereal may depend not only on their age, but on other factors as well. If, for example, the new cereal is presweetened, consumer's preferences may be based also on such factors as what type of cereal they usually eat, how much sugar they usually consume in their diet, what cereal they ate when they were children, or whether or not they have a medical condition that requires them to reduce the amount of sugar in their diets. Multiple regression analysis is a family of statistical techniques that allow one to predict the score on the dependent variable when given the scores on one more independent variables. This statistical technique analyzes the effects of multiple predictors on behavior so that the business analyst has a better understanding of their relative contributions as well as the factors that make up a consumer's preference or other question of interest.
These are only a few of the statistical techniques that can be used in the analysis of business data. Statistical techniques can be applied to the gamut of business problems from marketing research, quality control, prediction of marketplace trends or sales volume, or comparing the relative efficiency of the various operations in a multinational organization.
Applications
As opposed to mathematical statistics, the field of business statistics is an applied field used to help make practical decisions about real world problems. This is done through a variety of types of research studies.
In general, the goal of research is to describe, explain, and predict behavior. For example, a marketer may want to know which of two proposed new company logos will be most memorable and will have the most positive image in the minds of prospective customers. Another example of applied research is when the engineering department seeks to determine which of two graphical user interfaces is more user friendly. Designing a good research study depends in part on two factors: controlling the situation so that the research is only measuring what it is supposed to measure and including as many of the relevant factors as possible so that the research fairly emulates the real world experience.
In the simplest research design, a stimulus (e.g., a new company logo) is presented to the research subjects (e.g., potential customers) and a response is observed and recorded (e.g., which logo they liked better and why). There are three types of variables that are important in research. As discussed above, the variables of most concern in the design of a research study are the independent variable, which is the stimulus or experimental condition that is hypothesized to affect behavior, and the dependent variable, which is the observed effect on behavior caused by the independent variable. As shown in Figure 2, however, these are not the only variables that need to be controlled during a study. There are also extraneous variables -- variables that affect the outcome of the experiment (e.g., their response to the cereal) that have nothing to do with the independent variable itself. For example, if the subject is tired and hungry after a long day at work and looking forward to going home and having a steak for dinner, no breakfast cereal is likely to taste good. Similarly, if a subject has a cold and is asked to rate the difference between the two breakfast cereals, s/he might not be able to do so because s/he cannot taste well at the time. There are any number of such variables that are extraneous to the research question being asked but that still affect the outcome of the research. As much as possible, these need to be controlled. In this example, the analyst could hold all the tastings of the breakfast cereal first thing in the morning when no one has eaten yet. Similarly, the researcher could make sure that subjects in the experiment do not have a cold or allergies before they taste the cereal. Although it is impossible to control literally every possible extraneous variable, the more of these that are accounted for and controlled in the experimental design, the more meaningful the results will be.
As shown in Figure 3, research design starts with a theory based on real world observation. For example, from personal experience with two types of cereal and observations of how other people react to the cereals, a researcher may develop a preliminary theory that "Super Crunchies" is more likely to be successful in the target market than "Very Flakies." From these observations, s/he forms an empirically testable hypothesis concerning the relative attractiveness of the two kinds of cereal. For example, the hypothesis may be that "People like Super Crunchies better than Very Flakies." To find out if this hypothesis is true, the researcher next needs to operationally define the various terms (i.e., constructs) in the hypothesis. Specifically, s/he needs to determine what the components of "like better" are. To do this, s/he might develop a series of rating scales that measure the various components of whether someone buys a cereal (e.g., prefers it to the current brand, likes the taste, likes the mouth feel, likes the price). The researcher would then run the experiment, letting people try both cereals in a controlled setting, statistically analyzing the resulting data using inferential statistics, and -- based on the statistical significance of the answer -- determine whether it is likely to be cost effective to put one of the two cereals on the market.
Not all research is done in the laboratory, however. As stated above, it is important not only to control as many variables as one can when designing an experiment, but also to have the experimental situation emulate the real-world situation as much as possible. For example, although Super Crunchies may taste fine in a taste test in a laboratory, when the potential consumer is faced with the reality of feeding it to fussy children while simultaneously checking homework, making lunches, and driving the carpool, waiting for the cereal to lose a little of its crunch may be more than time allows.
There are a number of common research techniques that can be used to investigate business problems. The laboratory experiment allows the researcher the most control over extraneous variables. So, for example, the cereal tasting could always be held at the same time of day in a room with no distractions and allow people to eat as much of each cereal as they want at their leisure. However, this situation is far removed from the reality of how most people eat their breakfast. A second approach to research is to use a simulation. This can allow the researcher to bring in more real world variables but still control many of the extraneous variables. For example, people could be given only a limited time to eat the cereals while simultaneously being given other morning tasks to do. Alternatively, the research could be run as a field experiment in which people are given the cereal to try at home under the normal conditions in which they typically eat breakfast. This has the advantage of being more realistic, but it also has the disadvantage of giving the researcher less control over extraneous variables.
In addition to these research techniques in which the experimenter has some control over the variables, there are other approaches to studying business problems as well. The field study is an examination of how people behave in the real world. For example, if both cereals are already on the market, the researcher could observe what type of people bought each variety to determine whether the brands appealed to families with children or only to adults. This could be combined with another research technique called survey research. In survey research, subjects are interviewed by a member of the research team or asked to fill out a questionnaire regarding their preferences, reactions, habits, or other questions of interest to the researcher. For example, the researcher could ask each person buying the new cereal a list of questions such as how often they bought that particular variety of cereal, what other cereals they had tried before, what they liked about this brand, etc. However, although a very thorough interview or survey instrument can be written that would hypothetically gather all the data needed for the researcher to make decisions about the cereals, such instruments are often more lengthy than the potential research subject's attention span. In addition, as opposed to the other research techniques, surveys and interviews are not based on observation. Therefore, there is no way to know whether the subject is telling the truth.
Finally, the analyst does not necessarily have to do new research in order to statistically analyze data for decision making. Meta analysis and other secondary analysis techniques allow researchers to analyze multiple previous research studies to look for trends or general findings. Statistical analysis can also be applied to existing data that the business has collected for other purposes or that are publicly available. Although these approaches do not give the researcher control over the way that the data are collected, these approaches often yield interesting results that can add to the body of knowledge about a topic or that can inform business decisions.
Terms & Concepts
Analysis of Variance (ANOVA): A family of statistical techniques that analyze the joint and separate effects of multiple independent variables on a single dependent variable and determine the statistical significance of the effect.
Data: (sing. datum) In statistics, data are quantifiable observations or measurements that are used as the basis of scientific research.
Dependent Variable: The outcome variable or resulting behavior that changes depending on whether the subject receives the control or experimental condition (e.g., a consumer's reaction to a new cereal).
Descriptive Statistics: A subset of mathematical statistics that describes and summarizes data.
Hypothesis: An empirically-testable declaration that certain variables and their corresponding measure are related in a specific way proposed by a theory.
Independent Variable: The variable in an experiment or research study that is intentionally manipulated in order to determine its effect on the dependent variable (e.g., the independent variable of type of cereal might affect the dependent variable of the consumer's reaction to it).
Inferential Statistics: A subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences such as drawing conclusions about a population from a sample and in decision making.
Mathematical Statistics: A branch of mathematics that deals with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data are analyzed to find answers to quantifiable questions.
Null Hypothesis (H0): The statement that the findings of the experiment will show no statistical difference between the current condition (control condition) and the experimental condition.
Population: The entire group of subjects belonging to a certain category (e.g., all women between the ages of 18 and 27; all dry cleaning businesses; all college students).
Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population.
Standard Deviation: A measure of variability that describes how far the typical score in a distribution is from the mean of the distribution. The standard deviation is obtained by determining the deviation of each score from the mean (i.e., subtracting the mean from the score), squaring the deviations (i.e., multiplying them by themselves), adding the squared deviations, and dividing by the total number of scores. The larger the standard deviation, the farther away it is from the midpoint of the distribution.
Statistical Significance: The degree to which an observed outcome is unlikely to have occurred due to chance.
Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.
Bibliography
Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.
Jance, M. L. (2012). Statistics and the entrepreneur. Academy of Business Research Journal, 1 33-37. Retrieved November 20, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=85672206&site=ehost-live
Larwin, K. H., & Larwin, D. A. (2011). Evaluating the use of random distribution theory to introduce statistical inference concepts to business students. Journal of Education For Business, 86 (1), 1-9. Retrieved November 20, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=54533309&site=ehost-live
Levine, D. M., & Stephan, D. F. (2011). Teaching introductory business statistics using the DCOVA framework. Decision Sciences Journal of Innovative Education, 9 (3), 395-400. Retrieved November 20, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=65833386&site=ehost-live
Witte, R. S. (1980). Statistics. New York: Holt, Rinehart and Winston.
Suggested Reading
Bowerman, B. L. O'Connel, R. T., & Murphree, E. S. (2014). Business statistics in practice (7th ed.) New York, NY: McGraw-Hill.
Groebner, D. F., Shannon, P. W., Fry, P. C., & and Smith, K. D. (2011). Business statistics: A decision-making approach (8th ed.). Upper Saddle River, NJ: Prentice Hall/Pearson.
Kohli, A. S., Peng, C., & Mittal, P. (2011). Predictors of student success in undergraduate business statistics course. Journal of the Academy of Business & Economics, 11 (4), 32-42. Retrieved November 20, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=76503842&site=ehost-live
Levine, D. M., Krehbiel, T. C., & Berenson, M. L. (2013). Business statistics: A first course (6th ed.). Upper Saddle River, NJ: Prentice Hall/Pearson.
Witte, R. S., & Witte, J. S. (2013). Statistics. 10th ed. Hoboken, NJ: John Wiley & Sons.