Advanced Statistical Methods
Advanced statistical methods encompass a range of techniques designed to analyze complex real-world data that may not adhere to the assumptions necessary for simpler statistical approaches. While traditional inferential statistics, such as t-tests and analysis of variance, provide valuable insights in many scenarios, more sophisticated methods are essential when dealing with multi-dimensional data or non-normally distributed variables.
Multivariate statistics, for instance, allows analysts to understand the relationships between multiple independent and dependent variables. Techniques like factor analysis and multivariate analysis of variance (MANOVA) help in uncovering underlying patterns within data sets that are influenced by multiple factors. Nonparametric statistics serve as an alternative when data do not meet the stringent requirements of parametric tests, enabling analysis even with ordinal data or unknown distributions.
Additionally, time series analysis focuses on data collected over time, facilitating trend identification and forecasting. This approach is crucial for businesses aiming to make informed decisions based on historical patterns. Overall, advanced statistical methods are increasingly important in today’s data-driven environment, empowering decision-makers across various sectors to navigate complexity and derive meaningful conclusions from their analyses.
On this Page
- Statistics > Advanced Statistical Methods
- Overview
- Limitations to Inferential Statistics
- Statistics for Complex Computations
- Applications
- Multivariate Analysis
- Reasons for Use
- Multivariate Analysis of Variance
- Factor Analysis
- Regression
- Nonparametric Statistics
- Advantages of Nonparametric Statistics
- Disadvantages of Nonparametric Statistics
- Mann-Whitney U Test
- Wilcoxon Signed Rank Test
- The Spearman Rank Correlation Coefficient
- Analysis of Variance
- Time Series Analysis
- Methods of Time Series Analysis
- Naïve Methods
- Averaging Methods
- Regression Methods
- Mixed/Integrated Methods
- Terms & Concepts
- Bibliography
- Suggested Reading
Subject Terms
Advanced Statistical Methods
Applied statistics is concerned with the analysis of real world problems. However, real world situations can be complex and messy, and do not always lend themselves to analysis using simple inferential statistics. This is not to say that the workhorse procedures such as t-tests, the Pearson coefficient of correlation, and analysis of variance are unimportant or less valuable. However, there are situations where these techniques are not appropriate. Multivariate statistics is used to summarize, represent, and analyze multiple quantitative measurements obtained on a number of individuals or objects. Nonparametric statistics is used in situations where it is not possible to estimate or test the values of the parameters of the distribution or where the shape of the underlying distribution is unknown. In addition, techniques are available to analyze time series data — data gathered on a specific characteristic over a period of time — in order to develop meaningful business forecasts.
Keywords Distribution; Factor Analysis; Inferential Statistics; Model; Multivariate Analysis of Variance (MANOVA); Multivariate Statistics; Nonparametric Statistics; Population; Regression; Sample; Statistical Significance; Statistics; Time Series Data; Variable
Statistics > Advanced Statistical Methods
Overview
Knowledge of statistical methods is becoming increasingly important for success in business and management in many sectors in the 21st century. One of the reasons for this requirement is the ever increasing influx of data enabled by the proliferation of information systems. Technology today allows people to communicate greater amounts of data and information faster than ever before. This trend has enabled businesses to better manage customer accounts, forecast marketplace needs, and in general more successfully proactive and effective in the marketplace. Statistical methods can be very useful for analyzing real world problems and providing decision makers with better information.
Limitations to Inferential Statistics
However, the fact is that real world situations can be complex and messy, and do not always yield the kind of neat data that are easily analyzed by descriptive or simple inferential statistics. The power of more advanced statistical procedures is readily available through advances in computer science that take the drudgery out of complex and repetitive calculations, enabling business analysts to better analyze the multitude of data that crosses their desks.
This is not to say that the workhorse procedures such as t-tests, the Pearson coefficient of correlation, and analysis of variance are unimportant or less valuable. The analyst must choose the right tool for the job, and these techniques are workhorses because they are appropriate and powerful for so many situations. However, there are situations where these techniques are not appropriate. Sometimes the real world situation is so complex that it cannot easily be fit into a simple 2x2 or 3x3 matrix. The analyst may need to know about the influence of multiple independent variables on multiple dependent variables. In other cases, the data are not clean enough to meet the assumptions required for standard parametric statistics. There may be no reason to assume, for example, that the underlying distribution is normal or no way to estimate parameters of the population such as the mean and standard deviation. At other times, the data are not static and do not hold still for a neat analysis of the past in order to forecast the future. Changes in data over time must be acknowledged in order to forecast the needs of the future. In situations such as these, other methods are required.
Statistics for Complex Computations
Multivariate statistics is a branch of statistics that is used to summarize, represent, and analyze multiple quantitative measurements obtained on a number of individuals or objects. Examples of multivariate statistics include factor analysis, cluster analysis, and multivariate analysis of variance (MANOVA). These powerful tools help the analyst derive important information from complex data sets such as where it is important to determine the joint and separate effects of multiple independent variables on multiple dependent variables and determine the statistical significance of the effect. Nonparametric statistics is another class of statistical procedures that is used in situations where it is not possible to estimate or test the values of the parameters (e.g., mean, standard deviation) of the distribution or where the shape of the underlying distribution is unknown. Although not as powerful as standard parametric techniques, nonparametric statistics allow the analyst to derive meaningful information from a less than perfect data set. In addition, techniques are available to analyze time series data — data gathered on a specific characteristic over a period of time — in order to develop meaningful business forecasts.
Applications
There are many statistical tools available to the analyst who needs advanced tools to analyze real world situations. These approaches include:
- Multivariate analyses
- Nonparametric statistics
- Time series analysis.
Multiple techniques are available in each of the categories.
Multivariate Analysis
Reasons for Use
Multivariate analysis can be very useful for the analysis of complex real world systems and situations. There are a number of reasons for using multivariate analysis. First, any treatment or independent variable can affect subjects in multiple ways. For example, a downturn in the market may not only decrease consumer spending but also increase savings or the way that investments are made. The introduction of an innovative new product may make a task easier to perform, but may or may not change consumer buying habits based on how the new product performs vis a vis the old technology, how satisfied people are with the other technology, and so forth. In addition, the use of multiple criteria can give the analyst a better understanding of the characteristics under investigation. Behavior is influenced by many factors. Because of this, the results of laboratory experiments often translate poorly to the real world. Multivariate analysis allows the researcher to consider a more complex situation that is closer to the real world and to better explain and understand observed phenomena. In the research situation, multivariate analysis also enables researchers to cut down on the cost of research by allowing the data from several independent variables to be analyzed using a relatively small sample size compared with running multiple sequential experiments. Further, the use of powerful multivariate techniques helps the analyst avoid false positive results that can occur when running multiple univariate techniques.
Multivariate Analysis of Variance
A complete discussion of multivariate techniques is well beyond the scope of this document. However, several techniques are worthy of notice. Multivariate analysis of variance is a multivariate extension evaluation of variance for use in situations where analysis is needed for the joint and separate effects of multiple independent variables on multiple dependent variables and determine the statistical significance of the effect. Like univariate analysis of variance, multivariate analysis of variance attempts to determine whether or not changes in the independent variables affect the dependent variables. In addition, multivariate analysis of variance attempts to identify any interactions among the independent variables and associations between the dependent variables. Multivariate analysis of variance is particularly of interest in research and in complex situations that require an understanding of the effects of multiple independent variables on multiple dependent variables.
Factor Analysis
Another multivariate technique that can be useful in business settings is factor analysis. This multivariate technique analyzes interrelationships between variables and attempts to articulate their common underlying factors. Factor analysis is used in seemingly random situations where it is assumed that the nature of a domain is not actually chaotic, but it is attributed to multiple underlying factors. Multidimensional mathematical techniques are applied to the data to illustrate how they cluster together into "factors." In many ways, factor analysis is more a logical procedure than a statistical one although it is based on the analysis of Pearson correlation coefficients between data. Factor analysis performs a causal analysis to determine the likelihood of the same underlying processes resulting in multiple observations. Although factor analysis can yield interesting information about the relationships between seemingly unrelated data, the determination of factors, in the end, is a qualitative decision. Factor analysis does not determine "the" set of factors that underlie the data. However, considerations of the situation typically indicate that one solution is superior to the others. If such considerations are not available, the resulting factors will not be meaningful. The determination of relevant considerations, however, is one of the major problems in determining a better set of factors. Factor analysis was begun as a tool for the behavioral sciences. However, it has since become popular in other disciplines including marketing, product management, operations research, and other disciplines that require the understanding of large data sets.
Regression
Regression is a statistical technique used to develop a mathematical model for use in predicting one variable from the knowledge of another variable. In addition to simple linear regression, there are more advanced techniques available that allow the analyst to use both multiple independent and multiple dependent variables. The resultant regression equation is a mathematical model of a real world situation that can be invaluable for forecasting and learning more about the interaction of variables in the real world. There are many types of multivariate regression including multiple linear regression, multivariate polynomial regression, and canonical correlation.
Nonparametric Statistics
Commonly used inferential statistics such as t-tests, analyses of variance, and Pearson correlation coefficients make certain assumptions about the underlying distribution of the data that they are being used to analyze. One of the most common situations where nonparametric statistics are used is when the analyst only has ordinal or ranked data where the intervals between the data points may be uneven and interval or ratio data are not available. However, a number of nonparametric procedures are available that correspond to common tests used when the shape and parameters of a distribution are known.
Advantages of Nonparametric Statistics
There are several advantages to using nonparametric statistics.
- First, nonparametric statistics are less demanding about the characteristics of the data. Parametric statistics are only valid when certain underlying assumptions are met, particularly when the samples are smaller. For example, the one-sample t-test requires that the underlying distribution for the population be normal. For independent samples, there is a further requirement that the standard deviations be equal. If these assumptions are not true, the results of the analysis cannot be trusted. The nonparametric equivalents of these tests, however, do not make these assumptions.
- Second, nonparametric statistics frequently require less time and effort to calculate. The sign test, for example, provides a quick test of whether or not two treatments are equally effective just by counting the number of times one treatment is better than the other.
- Third, nonparametric statistics can provide some objectivity in situations were there is no reliable underlying scale for the data or where the use of parametric statistics would depend on an artificial metric.
- Finally, in some situations, although data are available, they have not been randomly sampled from a larger population and it is impossible to acquire a random sample. In such instances, standard parametric statistics cannot be used. However, the data can sometimes be analyzed using nonparametric statistics.
Disadvantages of Nonparametric Statistics
These advantages make the use choice of nonparametric statistics over parametric statistics tempting. Certainly, in many cases nonparametric statistics are quicker and easier to use and require less with the way that data are collected. However, nonparametric statistics are not without their disadvantages. First, because there are no parameters to describe the data, it is difficult to make quantitative statements about the actual differences between populations. Second, nonparametric statistics disregard valuable information. For example, the sign test mentioned previously completely disregards the values of the data and only examines whether or not the differences are positive or negative. Therefore, it is advisable to use parametric statistics wherever the data allow.
Mann-Whitney U Test
One of the first inferential statistics techniques taught in most courses is the t-test. This test is useful for comparing the means of two independent samples. For example, a marketer might want to know if there is any difference in reaction of people who responded to a proposed new logo to those who responded to the existing logo. Sometimes, however, one cannot estimate the means of the two independent samples. In that case, the nonparametric Mann-Whitney U test (also referred to as the Wilcoxon rank-sum test) can be used instead. This test enables the analyst to make meaningful comparisons between two independent nonparametric samples. This statistic enables the analyst to test whether or not two samples were retrieved from the same population. The Mann-Whitney U test assumes that the two samples are independent and that the observations are ordinal or continuous so that one can determine which of a value in a pair of measurements is greater.
Wilcoxon Signed Rank Test
Parametric situations where one desires to examine a set of differences are typically assessed using a paired t-test. The nonparametric equivalent of this test is the Wilcoxon signed rank test which is a substitute for the paired Student's t-test for cases where there are related samples or repeated measurements on a certain sample. The Wilcoxon signed rank test compares the differences between the measurements. Although it does not guess as to the shape of the underlying distribution, it does require interval data. The Wilcoxon signed rank test is a viable alternative to the t-test for those situations where the assumptions about the distribution made by the t-test cannot be met.
The Spearman Rank Correlation Coefficient
The Pearson product moment coefficient of correlation is used to determine degree to which two events or variables are consistently related. Correlation may be positive (i.e., when one variable increases in value, the other increases, too), negative (i.e., the value of one variable increases and the other decreases), or zero (i.e., the values of the two variables do not coincide with each other and are therefore unrelated). For situations where the data are nonparametric, the Spearman rank correlation coefficient is used. Unlike the Pearson coefficient of correlation, the Spearman does not require interval data nor does it assume that there is a linear relationship between the two variables.
Analysis of Variance
Analysis of variance is a family of statistical techniques used on parametric data that analyze the joint and separate effects of multiple independent variables on a single dependent variable and determine the statistical significance of the effect. Two techniques are available for performing similar analyses on nonparametric data. The Kruskal-Wallis analysis of variance by ranks allows the comparison of three or more groups while the Friedman two-way analysis of variance allows the comparison of groups that are classified by two different factors.
Time Series Analysis
Methods of Time Series Analysis
Another set of advanced statistical techniques involve the analysis of time series data. These are data gathered on a specific characteristic over a period of time. Time series data are used in business to examine patterns, trends, and cycles from the past and forecast patterns, trends, and cycles in the future. A number of methods are available for analyzing time series data including:
- Naïve methods
- Averaging
- Smoothing
- Regression analysis
- Decomposition.
Time series analysis can be very helpful in forecasting of future trends or needs in decision making about many aspects of the business including buying, selling, production, and hiring. However, although there are a number of tools available for analyzing time series data, it must be borne in mind that different approaches often yield different results.
Naïve Methods
Naïve forecasting models are simple models that assume that the best predictors of future outcomes are the more recent data in the time series. This assumption means that naïve forecasting models do not consider the possibility of the effects of trends, business cycles, or seasonal fluctuations on the data. Because of this assumption, naïve forecasting models work better on data that are reported more frequently (e.g., daily or weekly) or in situations without trends or seasonality.
Averaging Methods
Another approach to analyzing time series data uses averaging models. This approach helps circumvent the problem of naïve models in which the forecast is overly sensitive to irregular fluctuations. In the simple average model, the forecast for the next time period is the average of the values for a specified number of previous time periods. Moving averages, on the other hand, use the average value from previous time periods to forecast future time periods as well as update this average in each ensuing time period by including the new values not available in the previous average and dropping out the date from the earliest time periods.
Regression Methods
Time series data can also be modeled using autoregression. This is a type of multiple regression technique used in which future values of the variable are predicted from past values of the variable. In autoregression, the independent variables are time-lagged versions of the dependent variable.
Mixed/Integrated Methods
Time series data can also be modeled using mixed or integrated techniques that utilize both the moving average and autoregressive approaches. A commonly used mixed approach is the autoregressive integrated moving average (ARIMA) model (also called the Box-Jenkins model). Although ARIMA modeling techniques can be difficult to compute and interpret, they are powerful and frequently result in a better than either the use of moving averages or autoregressive techniques alone.
Terms & Concepts
Distribution: A set of numbers collected from data and their associated frequencies.
Factor Analysis: A multivariate statistical technique that analyzes interrelationships between variables and attempts to articulate their common underlying factors.
Inferential Statistics: A subset of mathematical statistics used in the evaluation and summary of data. Inferential statistics are used to make inferences and decisions such as drawing conclusions about a population retrieved from a sample.
Model: A representation of a situation, system, or subsystem. Conceptual models are mental images that describe the situation or system. Mathematical or computer models are mathematical representations of the system or situation being studied.
Multivariate Analysis of Variance (MANOVA): A statistical tool used to analyze the joint and separate effects of multiple independent variables on multiple dependent variables and determine the statistical significance of the effect.
Multivariate Statistics: A branch of statistics that is used to summarize, represent, and analyze multiple quantitative measurements obtained on a number of individuals or objects. Examples of multivariate statistics include factor analysis, cluster analysis, and multivariate analysis of variance (MANOVA).
Nonparametric Statistics: A class of statistical procedures that is used in situations where it is not possible to estimate or test the values of the parameters (e.g., mean, standard deviation) of the distribution or where the shape of the underlying distribution is unknown.
Population: The entire group of subjects belonging to a certain category (e.g., all women between the ages of 18 and 27; all dry cleaning businesses; all college students).
Regression: A statistical technique used to develop a mathematical model for use in predicting one variable from the knowledge of another variable.
Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population, assuming that such samples tend to reflect the characteristics of the larger population.
Statistical Significance: The extent to which an observed outcome is unlikely to have occurred due to chance.
Statistics: A field of mathematics concerned with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data are analyzed to find answers to quantifiable questions. Applied statistics uses these techniques to solve real world problems.
Time Series Data: Data gathered on a specific characteristic over a period of time. Time series data are used in business forecasting. To be useful, time series data must be collected at intervals of regular length.
Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.
Bibliography
Antoniano-Villalobos, I., & Walker, S. G. (2013). Bayesian Nonparametric Inference for the Power Likelihood. Journal Of Computational & Graphical Statistics, 22, 801-813. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=91557453&site=ehost-live
Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.
Cooley, W. W., & Lohnes, P. R. (1971). Multivariate data analysis. New York: John Wiley and Sons.
Dallal, G. E. (2007). Nonparametric statistics. Retrieved August 20, 2007, from Tufts University Online Database. http://www.tufts.edu/~gdallal/npar.htm
Hollander, M. & Wolfe, D. A. (1973). Nonparametric statistical methods. New York: John Wiley and Sons.
LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized Relative Importance Analysis: A Technique for the Comprehensive Decomposition of Variance in Higher Order Regression Models. Organizational Research Methods, 16, 449-473. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=87999971&site=ehost-live
Mitchell, L. (2003). Book review: Applied multivariate statistics for the social sciences. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 418-420. Retrieved August 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=10637883&site=ehost-live
Nazem, S. M. (1988). Applied time series analysis for business and economic forecasting. New York: Marcel Dekker.
Pindyck, R. S. & Rubinfeld, D. L. (1998). Econometric models and economic forecasts. Boston: Irwin/McGraw-Hill.
Steyn, H. S., & Ellis, S. M. (2009). Estimating an effect size in one-way multivariate analysis of variance (MANOVA). Multivariate Behavioral Research, 44, 106-129. Retrieved October 31, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=36449716&site=ehost-live
Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.
Zardasht, V. V., Zeephongsekul, P. P., & Asadi, M. M. (2012). On nonparametric estimation of a reliability function. Communications in Statistics: Theory & Methods, 41, 983-999. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=77190441&site=ehost-live
Suggested Reading
Arora, S., Little, M. A., & McSharry, P. E. (2013). Nonlinear and nonparametric modeling approaches for probabilistic forecasting of the US gross national product. Studies In Nonlinear Dynamics & Econometrics, 17, 395-420. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=90055940&site=ehost-live
Bissantz, N., Dümbgen, L., Holzmann, H., & Munk, A. (2007). Non-parametric confidence bands in deconvolution density estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69, 483-506. Retrieved August 21, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25137588&site=ehost-live
Di Giacinto, V. (2006). A generalized space-time ARMA model with an application to regional unemployment analysis in Italy. International Regional Science Review, 29, 159-198. Retrieved May 24, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=20711879&site=ehost-live
Zantek, Paul; Li, S., & Chen, Y. (2007). Detecting multiple special causes from multivariate data with applications to fault detection in manufacturing. IIE Transactions, 39, 771-782. Retrieved August 20, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=25228069&site=ehost-live