Research Methods in Economics and Business
Research Methods in Economics and Business involve systematic techniques used to analyze economic phenomena and inform business practices. Central to these methods is econometrics, which utilizes statistical tools to test economic theories against real-world data. This discipline has evolved from purely theoretical frameworks to a more scientific approach, enabling economists to validate their models through empirical evidence. Economists employ various types of data analysis, including time-series, cross-sectional, and panel data, to draw insights about market behaviors and trends. Key statistical techniques such as regression analysis and analysis of variance (ANOVA) are crucial for understanding relationships between variables and assessing the significance of observed data patterns. By applying these methodologies, economists can make informed predictions about future market conditions and enhance decision-making processes within businesses. Overall, the integration of rigorous statistical analysis has transformed economics into a more objective and precise field, allowing for continuous refinement of theories based on empirical findings.
On this Page
Subject Terms
Research Methods in Economics and Business
With statistics, economists can study the intricacies of real world markets, forecast future business conditions and, most importantly of all, test the applicability of their models to the real world. Before statistics was introduced, economics was a strictly theoretical exercise. People accepted its principles because they made sense or "seemed" right. But the all important objective corroboration of theory with fact was missing. That bothered economists keen on making their discipline more scientific. They went on to found the field of econometrics that has since become a mainstay of economic analysis.
Keywords Analysis of Covariance; Analysis of Variance; Central Tendency; Cross-Sectional Analysis; Dependent Variable; Econometrics; Independent Variable; Null Hypothesis; Panel Analysis; Regression Analysis; Standard Deviation; Statistical Significance; Time-Series Analysis
Economics > Research Methods in Economics & Business
Overview
As Adam Smith famously noted over two centuries ago, self-interest and competition collectively act like an "invisible hand" to shape the efficient allocation of resources in free and open markets. Ever since then, economists have sought to flesh out the way these forces are naturally harnessed and manifest themselves in mechanisms of exchange. Much of the work here has been, by necessity, descriptive. Observation in turn leads to the formation of intuitive models to explain otherwise random events. Scientists follow the exact same methodology. But economists cannot take the next all important step and experimentally validate their theories as scientists do. For no amount of ingenuity, alas, will successfully recreate real world economic activity in a laboratory.
A hypothesis cannot be proved or disproved unless it is tested in controlled conditions free from outside influences. Depending on the premises, what's more, the most patently absurd propositions can be successfully argued logically. No matter how brilliant the reasoning employed, then, a theory is treated as a fact only when it has been substantiated objectively. Barring that, it remains an assumption regardless of how elegant, insightful or useful it might be. As such, its applicability to real world phenomena is problematic, and any forecast based on it susceptible to unintentional error. Whereas businesses might lose money when predictions prove inaccurate, model-builders lose something much harder to recoup: credibility. In the absence of objective proof, the next best thing is a track-record of accurate predictions (Wallis, 1984).
What Adam Smith could not have foreseen was how intricate and complex these models could be when formulated as equations. Mathematics provides the means not only to build complex models of macroeconomic theory but also, crucially, to test their accuracy and statistical robustness. Econometrics in fact came about as a separate discipline when an enterprising economist in 1933 used the formulae describing harmonic oscillation to investigate and explain the business cycle (Bouhmans, 2001). Meanwhile, pure mathematics has proved a much suppler means of expressing economic concepts than words alone. So much so, in fact, that much of today's economic theory is elucidated using algebra, logarithms, calculus, group theory and symbolic logic, just as mathematical rules governing the manipulation of symbols and the advancement of formal proofs in general have given economists new ways to argue the legitimacy of their constructs ("Theoretical assumptions and nonobserved facts," 1985).
Moreover, Silvia Pala?c? writes, in the twenty-first century, one must pay attention to the use of software in economics, since software is built based on mathematical tools. The difference lies in the fact that the mathematics is hidden and is not directly accessible to the user, who often forgets the substrate of such software and makes use of it without fully understanding the results produced or ignoring the possible problems raised by the mathematical model behind the program (2013).
Only observable data can prove or disprove the veracity of their claims, and how can one ensure the accuracy and reliability of this data? What's to say the events in question are actually random in nature or an unrepresentative subset of the phenomenon under study? In other words, assuming your experimental design is sound, how do you know the data you've collected isn't just random? The short answer is statistics: the branch of mathematics specializing in the collection, assessment and analysis of large amounts of numerical data. And its roots interestingly lie in analytical geometry.
Further Insights
Econometricians gather, collate and interpret real-world macroeconomic data (yearly figures on national output, employment, inflation, the money supply, etc.) going back over 60 years. Among all economists, then, they are perhaps the most outwardly-oriented and, due largely to statistics, the most "scientific." Their counterparts in the business world employ the very same statistical techniques to solve production problems, maximize customer service functions and plan and evaluate marketing campaigns. The tools of their trade are time-series data, cross-sectional data, panel data, and multidimensional panel data. In each case, a sample of the larger population is examined and the findings extrapolated. How this sample is collected is crucial: It must be selected indiscriminately from a large data set that encompasses the full range of possible outcomes. Time series data is measured periodically over an extended duration while cross-sectional data is measured all at once, much like a camera takes a photograph. Panel data combines both for purposes of analysis, and multi-dimensional panel examines the potential impact on the resulting findings that the introduction of another variable might have. The mathematics employed includes single- and multiple-equation modeling, hypothesis testing, statistical significance assessment, regression analysis, and analyses of variance and covariance.
Modeling
To yield useful insights, models must take certain liberties with reality. For, a cause and effect relationship cannot be corroborated as long as any other possible influence might be in play. Economists would be particularly hard-pressed to provide any insight into the fundamental dynamics of the marketplace, in fact, if they could not avail themselves of the principle of Ceteris paribus, or "all other things being equal." It's a necessary presumption but one that nonetheless entails the deliberate oversimplification of complex processes. Model-builders are prepared to live with this trade-off because their primary concern is how faithfully they represent the phenomenon under study. Everything else is secondary. As such, bias-based error can and sometimes does creep unobserved into the model. Usually, however, it's tolerated because the risk and impact of any such error pales in comparison to the usefulness of the correlation demonstrated.
Advantages of Mathematical Models
Mathematical models have certain built-in advantages. Data by its very nature is numerical. Equations or inequalities, moreover, can more precisely state the nature of the relationship between two or more types of events than words or diagrams. Using both, econometricians can build very sophisticated multi-variable models of macroeconomic activity. Now, the variables, or symbols, denote a range of possible values for the events being studied. They can be either a discrete set of numbers or a continuous function or equation stating the probable distribution of future values. When a cause and effect relationship is being modeled, any change in an independent variable triggers a change in its dependent variable. Critically, though, a change in the dependent variable does not cause a change in the independent variable. A model is said to be deterministic, moreover, only when prior values and its equations account for all the change. Eliminating randomness entirely, though, is very hard to do. So, the values of most econometric variables are expressed differently, as a continuous probability function that only gives the odds on a particular quantity occurring. A dynamic model, finally, factors in the passage time and thus frequently employs differential equations from calculus, a static model does not and generally employs linear equations from algebra.
The Accuracy of Models
Curiously, though, testing the accuracy of any real world model is not a straight-forward affair, not at least when the burden of proof rests on statistics. For important reasons, in fact, the working hypothesis must first be turned into a negative statement called the null hypothesis. It asserts that there is no relationship even though the object is to show that there is one. This indirect approach has one very big advantage: The working hypothesis is rejected only when the ensuing statistical analysis confirms the random nature of the data. Otherwise, the working hypothesis is accepted. Yes, it's a roundabout way of doing things, but it has its merits. Foremost among these is how a negative assertion is far easier to statistically substantiate than a positive one. Indeed, all that has to be shown is that the results are not random; something statistics does very well. But, the only logical inference this form of analysis supports is that the data does not refute the working hypothesis. That's not the same as declaring outright that the data supports the original proposition or even that there's the reasonable likelihood that an underlying relationship exists. As conclusions go, it's rather circumscribed. Yet it nonetheless better reflects the general limitations of experimental inquiry, where a theory holds up only as long as the data supports it.
Statistical Significance
Any set of observed values captures a tiny fraction of the set of total values generated in the real world. How do we know if the historical data is not in fact just coincidence? In other words, how confident are we that the trend extrapolated from this limited amount data actually exists (McCloskey & Ziliak, 1996)? Of course, randomness exists to one degree or another in virtually everything. So how do you quantify it and how much is acceptable? Without a satisfactory answer, inferential statistics itself is fatally undermined. Much thought has therefore gone into the question of statistical significance: What it is and how it can be established. If we study a completely random process such as flipping a coin or throwing dice, the results on a scatter graph after a very large number a tries invariably resemble a bell-shaped curve. Here, slightly over two thirds of all random results fall in relatively close proximity to each other, within one standard deviation of the mean of distribution; the apex of the bell. Any graph can be divided horizontally into equal intervals on the other side of the samples mean via a measure of central tendency called the standard deviation. The fewest random events, between 0.5% and 2.5%, take place at the farthest reaches of the bell-shaped curve, at a standard deviation of plus or minus 2.5 and beyond.
For any "normally" distributed set of data to be significant, then, only a certain number of sample observations can fall beyond its standard deviation of plus or minus 2.5. And the sample's standard deviation can be readily calculated. First comes the basic arithmetic value where, by convention, the mean or average of the sample is subtracted from each value. Given how the result here, called a deviate, can be positive or negative, each result is then multiplied by itself, or "squared" to avoid unnecessary complications. The product of a negative number multiplied by itself is always positive. All of these values, what's more, can vary except the last. It must be fixed so the others can fluctuate; the total of all the deviations divided by the number of observations must equal the sample mean. That last value, when added or subtracted, ensures this. So the degree of freedom of a given sample is one less than its total number of observed events. And the mean of the sum of deviates is always divided by its sample's degree of freedom. All that remains now to arrive at the standard deviation is to return it to positive and negative values, which is done by taking the square root.
Regression Analysis
Regression analysis figures prominently in econometric and operations research. Given that it measures the extent to which an independent variable directly affects a dependent one, one can see why. The existence and extent of this casual relationship is determined mathematically by examining comparable data sets for both variables. By plotting each known data point along an X and Y axis, the resulting scatter graph shows an overall pattern that suggests a line that's either a line or curved or else simply a hodge-podge of dots. The so called regression line here is what's most important, be it barely decipherable or readily apparent. Two factors now need to be assessed: How many and how closely the data points cluster around an imaginary straight-line or curve. The greater the number and the nearer the proximity of each, the more likely changes in the independent variable(s) will occasion changes in the dependent variable. Conversely, the fewer the number and the farther the proximity, the less likely there's a direct cause-and-effect relationship. Thanks to analytical geometry, what's more, when the regression line is linear, the nature of the relationship can be expressed algebraically in a "regression equation," and its robustness by a constant within it called "the correlation coefficient." The latter is a number between -1 and 1, where 1 signals a direct correspondence between variables, -1 an inverse correspondence and 0 none. Most of the time, though, this correspondence is partial and this constant is a decimal less than 1 or -1.
Noting the predominant role of linear regression analysis in empirical economics, Soyer and Hogarth (2012) asked 257 academic economists to make probabilistic inferences based on different presentations of the outputs of this statistical tool. The questions concerned the distribution of the dependent variable, conditional on known values of the independent variable. The answers based on the standard presentation mode demonstrated "an illusion of predictability"; the outcomes were perceived to be more predictable than could be justified by the model. In particular, the authors noted, many respondents failed to take the error term into account. The implications of the study are that economists need to reconsider the way in which empirical results are presented and consider the possible provision of "easy-to-use simulation tools" that would help readers of empirical papers make accurate inferences.
Analysis of Variance & Covariance
Linear regression analysis is a relatively straight-forward form of one-way Analysis of Variance (ANOVA) for two variables. What exactly, though, is variance? Basically, it measures the degrees of difference between observed events, or "signal," and the randomness, or "noise," lurking in the background. ANOVA essentially assesses how spread out the dispersion pattern of individual data-points is both within and between these variables. When only two variables are being examined, this ratio's calculation is relatively straight-forward and is known as the t-test (Evans, 1999). First, the differences between variables, the signal, must be measured and summarized quantitatively by subtracting one mean from the other. When the sample is very large and the events normally distributed, the randomness of the overall phenomenon is said to be "known," so degrees of freedom are not needed in this equation. Each sample's variance is thus computed by dividing its sample size directly into the sum of its squared deviates. The standard error in the difference between the two sample's means is the square root of the sum of individual variances. Computing the all-important variance between samples is finally within reach. The only step left is to divide this square root into the difference between the two sample's aggregate means.
Adding any more variables to the mix complicates these calculations further; the deviation of the mean of one variable's sample, essentially, has to be compared to the deviation of every other variable's sample, be that one, two, three, etc. This is best done by first coming up with the mean value for each and using it to find a sample-grouping's deviates. These are then squared and added together. Determining the variance "within" groups, the next step, is now simply a matter of computing the mean of the sum of these sums. With one notable additional step, the same calculations are repeated to arrive at the variance of the "between" group data.
To begin with, the mean of the total data set is first determined and then subtracted from the mean of each sample-group and the results squared. Since the number of data points observed may differ between groups, however, the squared deviates cannot simply be added together. That's because the fewer the observations, generally speaking, the harder it is to accurately estimate how many are random. Sample sizes among "between" groups may thus not be homogenous, potentially skewing the analysis if not properly accounted for. This is done by multiplying each "between" group's sum of squares by the number of its recorded observations. The variance "between" groups is the sum of these weighted figures.
Separately, these two composite means do not give a precise, one figure rendering of the variance of the "signal" in relation to the variance of the 'noise." That requires one last calculation. Here, the mean variance of the "within" and "between" groups" must first be divided by its sample's degrees of freedom. Dividing the resulting 'between' group's quotient by the "within" group's, at long last, yield the desired proportion otherwise known as the F factor. A figure greater than one here suggests the likelihood that the results are random, while a figure less than one suggests that the results are not. The lower the F score, then, the greater the likelihood there's a cause-and-effect relationship between variables.
Sometimes, though, data for different variables tends to converge more than diverge, requiring a different type of treatment: Analysis of Covariance (ANCOVA). A synthesis of linear regression analysis and ANOVA, it gauges not the difference but the likeness between multiple sets of data and answers the all-important question: How can we be certain that two random variables are in fact causally linked? What if they are under the sway of an undetected outside factor, a "covariate? " Well, if influence of this covariate can be statistically eliminated from the data samples, it no longer poses a problem. Whenever a covariate is suspected, then, its known values are plotted, a trend-line computed and additional values estimated using regression analysis. The resulting correlation coefficients, in turn, are used to predict an alternate set of values for the dependent variable that can, in turn, be compared to the reported values. With these values in hand, an analysis of variance can determine how much or how little the two fluctuate, quantifying the covariate's influence on each random variable. An ANOVA reading of zero indicates the results are coincidental, a positive reading indicates that they are not. Whatever difference between the "real" and the "covariate-linked" values is the error that must be filtered out. And there in lies ANCOVA's usefulness.
Conclusion
All in all, econometrics is an exacting discipline. The statistics employed can be relatively simple like the techniques described above. Or, they can be devilishly complicated, as is the case when a number of variables are examined. (Recognizing this, and the importance of integrating financial accounting examples into an introductory undergraduate business statistics course, Drougas, Harrington, and Miller (2011) describe a project challenging students to review financial statement data for 27 firms in the restaurant industry, generate a multiple regression using software, and interpret the results in the language of business [i.e., which financial items impact a company's stock price]). But no matter how complex the equations, how massive the numbers-crunching and how limited the conclusions that can be drawn, statistics map the uncertainty ever-present in the real world in general and in macroeconomics in particular. With statistics, economists can test assumptions, refine or discard models, make forecasts and bring a scientific rigor to their analyses. Economics would not be nearly as objective or precise a field of study without it.
Terms & Concepts
Analysis of Covariance: Determines the likelihood that similar patterns of change in two random variables are the result of some underlying correspondence.
Analysis of Variance: Quantifies the difference in data samples that can be attributed to randomness.
Central Tendency: The clustering of normally distributed data around a sample mean.
Cross-Sectional Analysis: Examines data collected all at one time
Dependent Variable: An event that could be triggered by another event.
Econometrics: The statistical analysis of macroeconomic data to predict future conditions and test the accuracy of theoretical models.
Independent Variable: An event that could trigger another event.
Null Hypothesis: The claim that the data under examination was generated randomly.
Panel Analysis: The simultaneous application of times-series and cross-sectional analysis to a set of data.
Regression Analysis: The extent to which an independent variable affects a dependent one.
Statistical Significance: A basic measure of the extent of randomness active in a normally distributed data sample.
Standard Deviation: A measure of the distribution-frequency of data around the sample mean.
Time-Series Analysis: Examines data collected at regular intervals over an extended period.
Bibliography
Boumans, M. (2001). Measure for measure: How economists model the world into numbers. New School for Social Research, 68, 427-453. Retrieved October 8, 2007, from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=4897378&site=ehost-live
Drougas, A., Harrington, S., & Miller, J. (2011). Incorporating a practical financial accounting example into an introductory statistics course. Journal of the Academy of Business Education, 12, 121?136. Retrieved November 13, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=85624032&site=ehost-live
Evans, R. (1999). Chapter 2: Cherished beliefs and t-statistics. Macroeconomic forecasting (pp. 24-49). Oxfordshire: Taylor & Francis Ltd. Retrieved October 24, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=16885116&site=ehost-live
Granger, C. (2004). Time series analysis, cointegration, and applications. American Economic Review, 94, 421-425. Retrieved October 8, 2007, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=13785723&site=ehost-live
McCloskey, D., & Ziliak, S. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97. Retrieved October 31, 2007, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=9604151808&site=ehost-live
Pala?c?, S. (2013). Mathematics in economics. A perspective on necessity and sufficiency. Theoretical & Applied Economics, 20, 127?144. Retrieved November 12, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=91714193&site=ehost-live
Part Four: 23. Theoretical assumptions and nonobserved facts. (1985). Essays in economic theory, theorizing, facts & policies (pp. 272-282). Piscataway, NJ: Transaction Press. Retrieved October 24, 2007, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=11947813&site=ehost-live
Soyer, E., & Hogarth, R.M. (2012). The illusion of predictability: How regression statistics mislead experts. International Journal of Forecasting, 28, 695?711. Retrieved November 12, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=76615553&site=ehost-live
Wallis, K. (1984). Comparing time-series and nonlinear model-based forecasts. Oxford Bulletin of Economics & Statistics, 46, 383-389. Retrieved October 8, 2007, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=5174222&site=ehost-live
Suggested Reading
Jansen, E. (2002). Statistical issues in macroeconomic modelling. Scandinavian Journal of Statistics, 29, 193. Retrieved October 8, 2007, from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=6632667&site=ehost-live
Klein, L. (2006). Econometric modeling at mixed frequencies. Journal of Mathematical Sciences, 133, 1445-1448. Retrieved October 8, 2007, from EBSCO Online Database Academic Search Premier. http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=19499997&site=ehost-live
Phillips, P. (1988). Reflections on econometric methodology. Economic Record, 64, 344. Retrieved October 8, 2007, from EBSCO Online Database Business Source Premier. http://search.ebscohost.com/login.aspx?direct=true&db=buh&AN=5838709&site=ehost-live