Regression Analysis (Business)
Regression analysis is a powerful set of statistical tools used in business to build mathematical models that predict the value of one variable based on another. It can be broadly classified into simple linear regression, which involves predicting a dependent variable using a single independent variable, and multiple linear regression, where multiple independent variables are utilized for predictions. This analysis aids in understanding complex relationships among variables, which is crucial for effective decision-making in various business contexts, such as forecasting sales, evaluating marketing strategies, or analyzing consumer behavior.
However, regression analysis operates under several assumptions, including the correctness of the model and the quality of the data, which can often be compromised in real-world scenarios. Analysts must carefully interpret the results, as the relationships between variables can be intricate and influenced by various factors. Additionally, different regression techniques can address specific needs, like time series analysis for forecasting trends over time or multivariate regression for examining several dependent variables simultaneously. Overall, regression analysis provides valuable insights that can inform strategic decisions, ensuring businesses adapt effectively to market dynamics.
On this Page
- Simple Linear Regression
- Residual Analysis
- Assumptions Regarding Standard Regression
- Multiple Linear Regression
- Applications
- Analysis of Time Series Data
- Forecasting/Evaluation of Business Decisions
- Case Study: E-Bay Auction Analysis
- Case Study: Measure of Training Effectiveness
- Conclusion
- Terms & Concepts
- Bibliography
- Suggested Reading
Subject Terms
Regression Analysis (Business)
Regression analysis is a set of tools for building mathematical models that can be used to predict the value of one variable from another. Simple linear regression is a bivariate tool in which the value of one dependent variable is predicted from the knowledge of one independent variable. Multiple linear regression allows the prediction of the dependent variable from the knowledge of the value of more than one independent variable. Although regression analysis is widely used in business, it makes several assumptions including that the model is correct and that the data are good. Real world data, however, tend to be messy and as a result these assumptions are rarely true. The models developed using regression analyses are not perfect and the analyst needs to demonstrate care in their interpretation.
To be successful, organizations need a plan of action to help them reach their goals and objectives. Good business strategy, however, is not made in a vacuum, but is based on an intensive analysis of empirical data that includes information on market trends, consumer demand, competitor status, and the organization's resources and abilities. Further, the relationships between variables are often complex and synergistic: a decision that may seem right when considering one variable in isolation may be wrong when other factors are thrown into the mix. For example, a company's decision to produce more widgets in order to earn more income might make sense when one takes into account the facts that the company's plants are set up to produce widgets and that its personnel are experts in designing and making widgets. However, if this decision does not also take into account the fact that the competition that has just released a gizmo that makes the widget obsolete or that the target market for the widget has stopped buying them, the same decision could lead to disastrous results. Business decisions are often complex things, and frequently need to take into account multiple factors that can potentially affect outcomes.
Regression analysis is a family of statistical tools that can help managers and other organizational decision makers better understand the real world situation in which they must operate. Regression analysis allows one to build mathematical models that can be used to predict the value of one variable from knowledge of another. There are a number of specific regression techniques available to the business analyst:
- Simple linear regression analysis that involves only two variables.
- Multiple linear regression analysis, which involves two or more independent variables used in the prediction of a dependent variable.
- Multiple curvilinear regression, in which the relationship between variables is nonlinear (e.g., quadratic).
- Multivariate linear regression, which allows the simultaneous examination of several dependent variables.
- Multivariate polynomial regression, which can be used to account for nonlinear relationships.
The most commonly used of these techniques -- simple linear regression and multiple linear regression -- are discussed in the following sections.
Simple Linear Regression
Various correlation techniques are available to help the business analyst better understand the degree to which two events or variables are consistently related. For example, correlation can help one understand the relationship between household income and the probability of widget purchase: a positive correlation means that the higher the income the more likely it is that a widget will be purchased; a negative correlation shows that the lower the household income the more likely it is that a widget will be purchased; and a zero correlation shows that there is no relationship between income and the likelihood of purchase. However, as helpful as this information may be, knowing that there is a relationship between two variables does not always provide managers with sufficient information to make good decisions. In some situations, one needs to be able to predict the value of one variable from knowledge of another variable. For example, a marketer may want to determine what household income levels an advertising campaign should target. Merely knowing that there is a positive correlation between the two variables of income and purchase likelihood is insufficient to make this determination. To make this kind of interpretation based on the data, one needs to use simple linear regression.
Simple linear regression is a bivariate tool in which the value of one dependent variable is predicted from the knowledge of one independent variable. Examples of business applications of simple linear regression include predicting sales from population density, Dow Jones Averages from prime interest rates, and CEO bonuses from quarterly earnings. Data used in linear regression analysis are often graphed on a scatter plot (a graph depicting pairs of points for two-variable numerical data) so that a line of best fit can be determined and used to predict the value of the dependent variable based on different values of the independent variable. A sample scatter plot with line of best fit is shown in Figure 1.
The equation for the regression line is determined by the statistics equivalent of the slope intercept equation of line from basic algebra (i.e., y = mx + b), as seen in Figure 2.
For example, a trainer might want to determine the optimal cost of running a seminar based on how many students signed up for the course. Although more students would mean more income, more students also mean that there will be more expenses for such items as handouts and other printed materials, larger conference rooms, more trainers to run small group sessions, and so forth. If not enough students signed up for the course, on the other hand, the training course would not pay for itself because costs would be greater than income. A predictive model for cost vs. number of trainees could be developed using data collected on these variables for a number of past sessions of the training course. The slope of the line of best fit passing through the data could be mathematically calculated using these data points to determine the equation of the simple regression line. The training company or department could then use this equation to determine optimal class sizes for training courses based on the single variable of number of participants.
Residual Analysis
Of course, unless all the pairs of data collected are on one straight line, multiple lines can be drawn through a data set. The question therefore is to determine which of these lines is the line of best fit that will yield the best predictions of the dependent variable from the independent variable. This is determined through residual analysis. In regression analysis, a residual is the difference between the actual y values and the predicted y values (y -- y ^ ). The sum of the squares of the residuals is minimized in order to find the line of best fit. By looking at the residuals, an analyst can better understand how well the regression line fits past data in order to estimate how well it will predict future data.
Assumptions Regarding Standard Regression
Standard regression analysis makes several assumptions, including that the model is correct and that the data are good. Real world data, however, tend to be messy, and as a result these assumptions are rarely true, including the use of the incorrect functional form being used for the regression function; correlation of variables; inconstant variance; sample data with outliers (i.e., observations in which the value is abnormally large or small); and multicollinearity among subsets of the input variables such that they exhibit nearly identical linear relations. Where any one or more of these problems occur, the entire analysis may be invalidated. In addition, there are few indications in standard statistics to indicate that these problems have been incurred. There are other indicators and potential remedies for these situations; however, they must be used with caution. For example, nonuniform residual plots may be suggestive of underlying nonlinear functions. Although outliers and extreme points can be deleted from the analysis, care must be taken in doing so as they may indicate important information about the data. Outliers and extreme points may indicate that other variables need to be included in the analysis. Multicollinearity may be identified and rectified in a number of ways. However, the appropriateness of the method depends on the underlying cause of the problem.
Multiple Linear Regression
Although simple linear regression can be very useful for building models and predicting the value of one variable from the knowledge of the value of another variable, real world business problems are often more complex and include multiple variables. For many such situations, multiple linear regressions are appropriate. This technique allows the prediction of the dependent variable from the knowledge of the value of more than one independent variable. For example, in the illustration of training class size above, the trainer may be able to get a discount on the reproduction of class materials if more than a certain minimum quantity were ordered. Similarly, there might be several potential venues for holding the course, each having different sliding scales for costs such as room rental and break catering depending on the number of participants. By using multiple linear regression instead of simple linear regression, the trainer could take all of these independent variables into account to determine their effect on the dependent variable -- cost of the training course per person. In another example, a marketer might want to predict the profitability of a new product line based on multiple factors such as price, product life, and packing. This information could help the marketer determine whether or not investing in a new product line would make sense in terms of profitability based on the consideration of these or other variables deemed to be relevant. Or, an engineer might want to predict the life expectancy of a gizmo based on various factors affecting its quality such as grade of the steel used in the frame, thickness of the housing, and number of spot welds used in its construction. These data could give the engineer the information needed to make decisions for the most cost-effective design.
Applications
Analysis of Time Series Data
Regression analysis can be useful in any number of business situations where one needs to model real world situations and forecast future outcomes, trends, or other values. One place where regression analysis can be useful is in the analysis of time series data. Although there are a number of approaches to modeling time series data with simple data sets, these approaches tend not to account well for trends. However, linear regression and regression using quadratic models can be useful in such situations. As long as the time series data are not influenced by seasonal fluctuations, these methods can produce accurate forecasts. However, if it is assumed that there is a seasonal effect influencing the time series data, other techniques must be used. For example, Ozdeser & Ozyigit used regression analysis to analyze time series data on foreign trade and economic growth in northern Cyprus. The time series data used were acquired from the State Planning Organization's 2006 statistical yearbook and covered the years 1985 through 2005. The authors performed a series of ten regression analyses on the data to model different relationships. A Durbin-Watson test was used to control for serial correlation. Results of the analysis provide support for the authors' hypothesis that the government in northern Cyprus plays too large a role in the economy and that its expenditures are detrimental to the level of national income. Based on their analyses, the authors suggest ways that the role of the government can be limited to areas in which it is most effective.
Forecasting/Evaluation of Business Decisions
Analysis of time series data is not the only situation in which regression analysis can be meaningfully applied. Regression techniques are used in many business situations to help managers and other decision makers estimate or predict future trends. For example, regression analysis is frequently used to make forecasts to support managers in making decisions about many aspects of the business including buying, selling, production, and hiring.
Case Study: E-Bay Auction Analysis
As one illustration of how regression techniques are used in real world applications, Lucking-Reiley, Bryan, Prasad, and Reeves explored the determinants of prices in online auctions for collectible US pennies on eBay. Online auctions on eBay use an ascending bid format with a fixed end time and date set by the seller. In addition, the seller can choose to run the auction based on a number of selectable parameters. The site allows both buyers and sellers have the opportunity to rate each other after their transaction. These ratings are then made public for use in future transactions.
The authors collected data from eBay's website on US one-cent category auctions held over a thirty-day period. The data collected included last bid, opening and closing time and data, seller's ID and rating, minimum bid, number of bids, listing of bid history (including buyer's ID and rating, price, and time and date of bids), and information on the sellers. A regression analysis was performed to analyze the results of the auctions. Variables used in the regression analysis included estimates of the coin's value (i.e., book value), minimum bid in the auction, final price paid for the coin, number of bids made, secret reserve price, number of days for which the coin was up for sale, total number of positive ratings for the seller, and total number of negative ratings for the seller.
The regression analysis revealed several findings.
- First, it was found that the reputation of the seller had a measurable impact on the auction prices. Further, contrary to the authors' expectations, it was noted that negative ratings had a larger impact on the auction than did positive ratings.
- Second, longer auctions tended to attract more bidders and ended in higher final prices than did shorter auctions.
- Third, both reserve prices and minimum bids were found to have positive effects on the auction price. However, it was also found that these strategies could have either positive or negative impact on the auction, although in some cases, use of these strategies resulted in higher winning bids, while in other cases the strategies resulted in the coins not being sold.
Case Study: Measure of Training Effectiveness
Another study of how regression analysis is used in the real world was performed by Narayan and Steele-Johnson, who explored the relationships between one's experience with training, gender, goal orientation, and training attitudes. Most organizations offer various types of training courses and programs to their employees on a wide range of general and specific topics related to job performance. Although internal measures of training effectiveness such as course ratings and end-of-course quizzes can give trainers and managers alike some idea of how much trainees learned in a course, training transfer -- the degree to which skills taught in the training course are applied on the job -- tends to be a better measure of training effectiveness.
The authors used regression analysis to test two hypotheses:
- Prior experience with relevant training has a positive influence on trainees' attitudes towards training.
- Gender moderates goal orientation effects on the attitudes of trainees toward training.
Participants in the experiment were 174 undergraduate students with a minimum of six months part-time work experience. Data were gathered using a thirteen-question demographic questionnaire, a series of scales assessing attitudes toward training, and an instrument designed to measure goal orientation.
The authors used hierarchical multiple regression analysis to examine the effects of training experience, gender, and goal orientation on trainees' attitudes toward training. They found that a mastery-approach goal orientation had a positive effect on training attitudes for males but not for females. The analysis also showed that prior experience with training had a positive effect on training attitudes in general, although the impact was greater for women than for men (Narayan and Steele-Johnson, 2007).
Conclusion
Regression analysis is a family of statistical tools that can help business analysts build models to predict trends, make tradeoff decisions, and model the real world for decision-making support. These models can be used to predict the value of one or more variables from a knowledge of the value of other variables. Specific regression techniques include simple linear regression analysis, multiple linear regression analysis, multiple curvilinear regression, multivariate linear regression, and multivariate polynomial regression. Although a regression analysis is widely used in business for model building and to support decision making, the models developed using regression analysis are not perfect, and the analyst needs to demonstrate care in his or her interpretation.
Terms & Concepts
Correlation: The degree to which two events or variables are consistently related. Correlation may be positive (i.e., as the value of one variable increases the value of the other variable increases), negative (i.e., as the value of one variable increases the value of the other variable decreases), or zero (i.e., the values of the two variables are unrelated). Correlation does not imply causation.
Data: (sing. datum) In statistics, data are quantifiable observations or measurements that are used as the basis of scientific research.
Dependent Variable: The outcome variable or resulting behavior that changes depending on whether the subject receives the control or experimental condition (e.g., a consumer's reaction to a new cereal).
Forecasting: In business, forecasting is the science of estimating or predicting future trends. Forecasts are used to support managers in making decisions about many aspects of the business, including buying, selling, production, and hiring.
Independent Variable: The variable in an experiment or research study that is intentionally manipulated in order to determine its effect on the dependent variable (e.g., the independent variable of type of cereal might affect the dependent variable of the consumer's reaction to it).
Linear Regression: A statistical technique used to develop a mathematical model for use in predicting one variable from the knowledge of another variable or variables.
Model: A representation of a situation, system, or subsystem. Conceptual models are mental images that describe the situation or system. Mathematical or computer models are mathematical representations of the system or situation being studied.
Multi-collinearity: A situation occurring when two or more independent variables in a multiple regression analysis are highly correlated.
Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population.
Time Series Data: Data gathered on a specific characteristic over a period of time. Time series data are used in business forecasting. To be useful, time series data must be collected at intervals of regular length.
Seasonal Fluctuation: Changes in economic activity that occur in a fairly regular annual pattern. Seasonal fluctuations may be related to seasons of the year, the calendar, or holidays.
Trend: The persistent, underlying direction in which something is moving in either the short, intermediate, or long term. Identification of a trend allows one to better plan to meet future needs.
Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.
Bibliography
Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.
Hidalgo, B., & Goodman, M. (2013). Multivariate or multivariable regression? American Journal of Public Health, 103(1), 39-40. Retrieved December 2, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=84677454
Lucking-Reiley, D., Bryan, D., Prasad, N., & Reeves, D. (2007). Pennies from eBay: The determinants of price in online auctions. Journal of Industrial Economics, 55(2), 223-233. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25893978&site=bsi-live
Narayan, A. & Steele-Johnson, D. (2007). Relationships between prior experience of training, gender, goal orientation and training attitudes. International Journal of Training & Development, 11(3), 166-180. Retrieved September 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=26334318&site=bsi-live
Nimon, K. F., & Oswald, F. L. (2013). Understanding the results of multiple linear regression: Beyond standardized regression coefficients. Organizational Research Methods, 16(4), 650-674. Retrieved December 2, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=90053849
Ozdeser, H. & Ozyigit, A. (2007). Foreign trade and economic growth in northern Cyprus: A time series analysis. International Research Journal of Finance and Economics, 10, 88-96. Retrieved September 27, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25963684&site=ehost-live
Spiller, S. A., Fitzsimons, G. J., Lynch JR., J. G., & McClelland, G. H. (2013). Spotlights, floodlights, and the magic number zero: Simple effects tests in moderated regression. Journal of Marketing Research (JMR), 50(2), 277-288. Retrieved December 2, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=86184853
Timm, N. H. (1975). Multivariate analysis with applications in education and psychology. Monterey, CA: Brooks/Cole Publishing Company.
Witte, R. S. (1980). Statistics. New York: Holt, Rinehart and Winston.
Suggested Reading
Greenberg, I. (2001). Regression analysis. In Saul I. Gass, S. I. & Harris, C. M. (eds), Encyclopedia of Operations Research and Management Science (pp. 704-706). New York: Wiley. Retrieved September 10, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=21891821&site=bsi-live
Hannonen, M. (2005). An analysis of land prices: A structural time-series approach. International Journal of Strategic Property Management, 9(3), 145-172. Retrieved September 27, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=17640834&site=bsi-live
Luigi, D., Oana, S., Mihai, T., & Simona, V. (2012). The use of regression analysis in marketing research. Studies in Business & Economics, 7(2), 94-109. Retrieved December 2, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=86731167
Sargeant, A. & West, D. C. (2001). Analytical procedures. In A. Sargeant and D. C. West, Direct and Interactive Marketing (pp. 235-278). New York: Oxford University Press. Retrieved September 10, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=7500276&site=bsi-live
Tonidandel, S., & LeBreton, J. (2011). Relative importance analysis: A useful supplement to regression analysis. Journal of Business & Psychology, 26(1), 1-9. Retrieved December 2, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=57817735