Applied Regression and Analysis of Variance

Good business strategy is based on the rigorous analysis of empirical data. Applied statistics can help managers and other organizational decision makers develop better strategies and make better plans to help ensure the success of the organization. Two of the most commonly used statistical tools for giving decision makers the tools they need for these activities are regression analysis and analysis of variance. Regression analysis is a family of statistical techniques that is used to develop mathematical models that can be used for forecasting. Analysis of variance is a family of techniques used to analyze the joint and separate effects of multiple independent variables on a single dependent variable and to determine the statistical significance of the effect. Both regression analysis and analysis of variance can be invaluable in providing managers and other organizational decision makers with the information that they need to make informed decisions and develop plans to increase the effectiveness of the organization.

In business, a strategy is a plan of action to help the organization reach its goals and objectives. A good business strategy is based on the rigorous analysis of empirical data, including market needs and trends, competitor capabilities and offerings, and the organization's resources and abilities. Applied statistics help managers and other organizational decision makers develop better strategies and make better plans to help ensure the success of the organization. Two of the most commonly used statistical tools for giving decision makers the tools they need for these activities are regression analysis and analysis of variance (ANOVA). These tools are part of inferential statistics, a subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences from empirical data, including allowing decision makers to draw conclusions about a population from a sample.

Regression Analysis

Knowing that there is a relationship between two variables does not always provide managers with sufficient information to make good decisions. In some situations one needs to be able to predict the value of one variable from knowledge of another variable. Regression analysis is a family of statistical techniques that is used to develop mathematical models that can be used for this purpose. Data are typically graphed on a scatter plot (a graph depicting pairs of points for two-variable numerical data) so that a line of best fit can be determined and used to predict the value of the dependent variable based on different values of the independent variable. A sample scatter plot with line of best fit is shown in Figure 1.

ors-bus-761-126438.jpg

For example, a trainer might want to determine the optimal cost of running a seminar based on varying numbers of students. Although more students would mean more income, this situation would also mean more expenses such as costs for handouts, larger conference rooms, more trainers to run small group sessions, and so forth. With too few students, on the other hand, the training course would not pay for itself. A predictive model for cost versus number of trainees could be developed using data collected on these variables for a number of training courses. The slope of the line of best fit passing through the data could be mathematically calculated to determine the equation of the simple regression line. Trainers could then use this equation to determine optimal class sizes for training courses.

Simple linear regression can be invaluable for building models and predicting the value of one variable from the knowledge of the value of another variable. However, real-world business situations are often more complicated than this scenario. Multiple regression allows the prediction of the dependent variable by the use of more than one independent variable. For example, a marketer might want to predict the profitability of a new product line based on multiple factors such as price, product life, and packing. Such information could help the marketer determine whether investing in a new product line would be a sound investment based on the consideration of several variables. Similarly, an engineer might want to predict the life expectancy of a gizmo based on factors such as quality of the steel used in the frame, thickness of the housing, or other factors. This could give the engineer the information needed to make decisions for the most cost-effective design. This type of complex question can be answered using models built by multiple regression analysis.

Analysis of Variance

Another commonly used statistical technique in business is analysis of variance. This is a family of techniques used to analyze the joint and separate effects of multiple independent variables on a single dependent variable and to determine the statistical significance of the effect. Analysis of variance examines two sources of variability: variability between groups and variability within groups.

  • Variability between groups is the variation among the scores of subjects that are treated differently. For example, if one were trying to determine which of three batteries had the longest life, the between groups variability would look at the difference in battery life for batteries X vs. Y vs. Z.
  • Within groups variability, the second type of variability of interest in analysis of variance, looks at the variation among the scores of subjects that are treated alike. Within groups variability would look at the variation of scores among all battery Xs tested, all battery Ys tested, and all battery Zs tested. Within groups variability is sometimes referred to as the "error term" because it is due to random error resulting from uncontrolled factors such as individual differences.

Analysis of variance is not a single technique but is actually a family of techniques used with experimental research. In the completely randomized design one-way analysis of variance, subjects are randomly assigned to treatments in a research design that contains only one independent variable with two or more treatment levels. For example, if one wanted to know which of several packaging options people preferred (e.g., the current packaging vs. one or more new packaging options), one could analyze the data with a one-way analysis of variance. Sometimes a second variable (referred to as a blocking variable) is used to control for confounding or extraneous variables that are not being tested in the research. Another design is two-way analysis of variance, which is used when the research design includes two or more independent variables (treatments) that the analyst desires to examine simultaneously. For example, the marketing department may want to know if there is a difference between the way that women and men react to the three packaging options for the widget. In addition to univariate analysis of variance, multivariate analysis of variance (MANOVA) techniques are available that allow the business analyst to test hypotheses on more complex problems involving the simultaneous effects of multiple independent variables on multiple dependent variables.

Applications

Both regression analysis and analysis of variance can be invaluable in providing managers and other organizational decision makers with the information that they need to make informed decisions and to develop plans to increase the effectiveness of the organization. The following sections discuss some actual examples of how these statistical techniques are used in the real world to provide information and insights for organizational effectiveness and decision making.

Regression Analysis & Forecasting

Regression analysis is used in many business situations to help managers and other decision makers estimate or predict future trends. Regression analysis is frequently used to make forecasts to support managers in making decisions about many aspects of the business including buying, selling, production, and hiring.

Online Auctions

An example of how regression techniques are used in real-world applications is offered by the work of Lucking-Reiley, Bryan, Prasad, and Reeves (2007), who explored the determinants of prices in online auctions for collectible US pennies on eBay. Online auctions allow customers to determine the price of products rather than paying a fixed, predetermined amount. eBay auctions use an ascending bid format with a fixed end time and date set by the seller. The seller can choose to run the auction based on a number of selectable parameters. Both buyers and sellers have the opportunity to rate each other after their transaction. This information is then made public for use in future transactions. The authors collected data directly from eBay's website on US one-cent category auctions held over a 30-day period. The data collected included last bid, opening and closing time and data, seller's ID and rating, minimum bid, number of bids, listing of bid history (including buyer's ID and rating, price, and time and date of bids), and information on the sellers.

Regression analysis was performed to analyze the results of the auctions. Variables in the regression analysis included estimates of the coin's value (i.e., book value), minimum bid in the auction, final price paid for the coin, number of bids made, secret reserve price, number of days for which the coin was up for sale, total number of positive ratings for the seller, and total number of negative ratings for the seller. The analysis revealed several findings. First, the analysis showed that the reputation of the seller had a measurable impact on the auction prices. Further, it was noted that negative ratings had a larger impact on the auction than did positive ratings. Second, longer auctions tended to attract more bidders and ended in higher final prices. Third, both reserve prices and minimum bids tended to have positive effects on the auction price. However, these strategies could have either positive or negative impacts on the auction. In some cases, use of these strategies resulted in higher winning bids while in other cases the strategies resulted in the coins not being sold.

Training Evaluation

Another example of how regression analysis is used in the real world explores the relationships between one's experience with training, gender, goal orientation, and training attitudes. Most organizations offer various types of training courses and programs to their employees including topics such as new employee orientation, specific job skills training, general job tools, leadership and management issues, among others. Although internal measures of training effectiveness such as course ratings and end-of-course quizzes can give trainers and managers alike some idea of how much trainees learned in a course, it is training transfer -- the degree to which skills taught in the training course are applied on the job -- that is the true measure of training effectiveness. Narayan and Steele-Johnson (2007) used regression analysis to test two hypotheses:

  • Prior experience with relevant training has a positive influence on trainees' attitudes toward training.
  • Gender moderates goal orientation effects on the attitudes of trainees toward training.

Participants in the experiment were 174 undergraduate students with a minimum of six months part-time work experience. Data collected included a 13-question demographic questionnaire, a series of scales assessing attitudes toward training, and an instrument designed to measure goal orientation. Participants in the research completed the questionnaires in a classroom-like setting with other participants.

A hierarchical multiple regression approach was used to examine the effects of training experience, gender, and goal orientation on trainees' attitudes toward training. Results of the analysis showed that a mastery-approach goal orientation had a positive effect on training attitudes for males but not for females. Prior experience with training had a positive effect on training attitudes in general, although the impact was greater for women than for men (Narayan and Steele-Johnson, 2007).

Analysis of Variance

Analysis of variance can be used in a wide range of business situations to help managers make better decisions. Marketing, engineering, and human resources are just a few of the disciplines within the organization that can benefit from the information garnered through the use of analysis of variance techniques on real world data.

Strategic Value of Information Technology

In one example of a real-world application of analysis of variance, Oh and Pinsonneault (2007) compared two conceptual and two analytical approaches that are used to assess the strategic value of information technology in the workplace. Their work was designed to help fill in a gap in the literature concerning the lack of consensus about the strategic value of information technology. The lack of consistent results has often been attributed to issues of data and measurements, sample size, industry type, and the choice of dependent variables. However, the fact that different theoretical frameworks are often used in different research had previously not been taken into account. To help explore this aspect, they used analysis of variance to test two hypotheses:

  • The contingency-based perspective is a better predictor of the strategic value of information technology than is the resource-centered perspective.
  • In the contingency-based perspective, a nonlinear model is a better predictor of the strategic value of information technology than is a linear model.

In the resource-centered view, information technology is considered to be a strategic resource that can directly influence organizational performance. In the contingency-based view, the strategic importance of information technology is considered within the context of the organization's strategy and how the information technology strategy supports this. Oh and Pinsonneault (2007) worked with a self-selected sample of 110 small and medium-size manufacturing (out of 787 contacted) firms to collect data on the strategic value of information technology. Data were collected using a business strategy questionnaire and an information technology strategy questionnaire. The business strategy of the organizations was clustered into three broad dimensions: cost reduction, quality improvement, and revenue growth. Information technology strategy was assessed by examining the information technology applications that were in such and supporting the strategic priorities of the organization. Organizational performance was assessed using the objective measures of expense and revenue and a subjective scale of perceived profitability.

An analysis of variance was performed to compare the strength of the models under investigation. Based on the results, it was found that a nonlinear approach performed better than a linear approach in predicting expense, but that there was no statistically significant difference between the approaches in the other models. Specifically, the results indicate that the alignment of information technology with a cost reduction strategy can generate more immediate benefits for organizations than a strategy alignment that attempts to facilitate revenue growth. The implication of the results can help managers optimize their investment in information technology.

Psychological Burnout within the Workplace

In another example of the use of analysis of variance applied to real-world business problems, Bakker, Westman, and Schaufeli (2007) investigated the hypothesis that psychological burnout on the job may cross over from one person to another. The research comprised two studies: one with teachers and a second one with soldiers. The first study tested the hypothesis that burnout crosses over from one person to another. The study used 30 high school teachers who were randomly exposed to a burned-out colleague or one who expressed negative opinions about a topic unrelated to work. The independent variable in the study was the burnout status of the stimulus person. This was exhibited in one of two bogus newspaper articles that either referred to a study discussing the decline in high school teaching quality (experimental condition) or to the escape of several prisoners (control condition). The dependent measures used in the study comprised measures of emotional exhaustion, depersonalization, and personal accomplishment. In addition, the study included a "manipulation check" to determine whether the participants believed that the person writing the article was indeed burned out. An analysis of variance was performed using burnout status of the stimulus person as the independent variable and the manipulation check as the dependent variable. The results of the analysis showed that study participants exposed to the bogus article by the burned-out writer believed that the writer was indeed burned out. To test the crossover hypothesis, a multivariate analysis of variance was used in which the burnout status of the stimulus person was the independent variable and the three burnout dimensions were the dependent variables. The findings demonstrated that the crossover effect was statistically significant for emotional exhaustion and depersonalization, but not for personal accomplishment.

The second study tested the hypothesis that the similarity between the sender and the receiver intensifies the effect of the burnout crossover. Participants in the study were 101 Dutch soldiers who were all members of the same battalion. Participants were randomly assigned to one of two conditions: a videotape of an engaged (control condition) or burned out (experimental condition) colleague. The experimental condition was further broken into two subgroups by whether the stimulus person was a peer (soldier) or a superior (squadron leader). Dependent variables in the study were exhaustion, cynicism, and professional efficacy. In addition to the control variable, the study also included a manipulation check as in the first study. The data were analyzed using a 2x2 analysis of variance design using the burnout status of the stimulus person on the video tape (i.e., burned out vs. engaged) and the similarity of the stimulus person to the participant (i.e., different vs. similar). Analysis of the manipulation check data showed that study participants believed that the stimulus person was burned out or not (depending on the condition). To test the hypothesis that soldiers were most likely to experience crossover burnout from soldiers of similar status, a 2x2 multiple analysis of variance technique was used (burnout status of the stimulus person and similarity to the stimulus person) with three burnout variables. Results of the analysis showed no effects for exhaustion, but a significant effect for cynicism and professional efficacy (Bakker, 2007).

Conclusion

Two of the most frequently used statistical techniques for making inferences from real-world data in the business world are regression analysis and analysis of variance. These tools can be invaluable to the business analyst and decision maker alike in understanding and solving real-world business problems.

Terms & Concepts

Analysis of Variance (ANOVA): A family of statistical techniques that analyze the joint and separate effects of multiple independent variables on a single dependent variable and determine the statistical significance of the effect.

Data: (sing. datum) In statistics, data are quantifiable observations or measurements that are used as the basis of scientific research.

Dependent Variable: The outcome variable or resulting behavior that changes depending on whether the subject receives the control or experimental condition (e.g., a consumer's reaction to a new cereal).

Hypothesis: An empirically testable declaration that certain variables and their corresponding measure are related in a specific way proposed by a theory.

Independent Variable: The variable in an experiment or research study that is intentionally manipulated in order to determine its effect on the dependent variable (e.g., the independent variable of type of cereal might affect the dependent variable of the consumer's reaction to it).

Inferential Statistics: A subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences such as drawing conclusions about a population from a sample and in decision making.

Linear Regression: A statistical technique used to develop a mathematical model for use in predicting one variable from the knowledge of another variable.

Population: The entire group of subjects belonging to a certain category (e.g., all women between the ages of 18 and 27, all dry cleaning businesses, all college students).

Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population.

Statistical Significance: The degree to which an observed outcome is unlikely to have occurred due to chance.

Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.

Bibliography

Bakker, A. B., Westman, M., & Schaufeli, W. B. (2007). Crossover of burnout: An experimental design. European Journal of Work & Organizational Psychology, 16(2), 220-239. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25346634&site=bsi-live

Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.

Lucking-Reiley, D., Bryan, D., Prasad, N., & Reeves, D. (2007). Pennies from eBay: The determinants of price in online auctions. Journal of Industrial Economics, 55(2), 223-233. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=2589 3978&site=bsi-live

Narayan, A. & Steele-Johnson, D. (2007). Relationships between prior experience of training, gender, goal orientation and training attitudes. International Journal of Training & Development, 11(3), 166-180. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26334318&site=bsi-live

Oh, W. & Pinsonneault, A. (2007). On the assessment of the strategic value of information technologies: Conceptual and analytical approaches. MIS Quarterly, 31(2), 239-265. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebsco-host.com/login.aspx?direct=true&db=bth&AN=24838001&site=bsi-live

O'Hara, M., & Parmeter, C. F. (2013). Nonparametric generalized least squares in applied regression analysis. Pacific Economic Review, 18 (4), 456-474. Retrieved November 26, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=91255576&site=ehost-live

Ortega, E. M., Cordeiro, G. M., & Hashimoto, E. M. (2011). A log-linear regression model for the beta-Weibull distribution. Communications in Statistics: Simulation & Computation, 40 (8), 1206-1235. Retrieved November 26, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=60106711&site=ehost-live

Witte, R. S. (1980). Statistics. New York: Holt, Rinehart and Winston.

Yahya-Zadeh, M. (2012). Comprehensive variance analysis based on ex post optimal budget. Academy of Accounting & Financial Studies Journal, 16 65-85. Retrieved November 26, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=84334956&site=ehost-live

Suggested Reading

Britten-Jones, M., Neuberger, A., & Nolte, I. (2011). Improved inference in regression with overlapping observations. Journal of Business Finance & Accounting, 38 (5/6), 657-683. Retrieved November 26, 2013 from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=624 97247&site=ehost-live

Hui, T.-K. & Wan, D. (2007). Factors affecting Internet shopping behaviour in Singapore: Gender and educational issues. International Journal of Consumer Studies, 31(3), 310-316. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=24561069&site=bsi-live

Hyunjoong K., Loh, W.-Y., Shih, Y.-S., & Chaudhuri, P. (2007). Visualizable and interpretable regression models with good prediction power. IIE Transactions, 39(6), 565-579. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search. ebscohost.com/login.aspx?direct=true&db=bth&AN=24471426&site=bsi-live

Mostafa, M. M. (2007). Gender differences in Egyptian consumers' green purchase behaviour: The effects of environmental knowledge, concern and attitude. International Journal of Consumer Studies, 31(3), 220-229. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=24561080&site=bsi-live

Schonfeld, I. S. & Rindskopf, D. (2007). Hierarchical linear modeling in organizational research: Longitudinal data outside the context of growth modeling. Organizational Research Methods, 10(3), 417-429. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25580935&site=bsi-live

VanZante, N. (2007). Helping students see the "big picture" of variance analysis. Management Accounting Quarterly, 8(3), 39-47. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26200914&site=bsi-live

Whiteoak, J. (2007). The relationship among group process perceptions, goal commitment and turnover intention in small committee groups. Journal of Business & Psychology, 22(1), 11-20. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26290218&site=bsi-live

Essay by Ruth A. Wienclaw, Ph.D.

Dr. Ruth A. Wienclaw holds a Doctorate in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human/systems integration.