Bivariate Data

Bivariate data is data that deals with two variables, which are often being compared in order to learn about the relationship between them. This data is often represented using a scatter plot.

If analysis is restricted to a single random variable, then we speak about univariate distribution and univariate data (observations of the random variable , for example, ). When analysis involves two random variables, for example, height and weight of a person, each can be considered separately (marginal distribution), as two univariate random variables. Alternatively, both variables can be considered simultaneously (joint distribution), as a bivariate random (vector) variable. Repeated observations of the bivariate variable (pair of random variables), for example, , constitute bivariate data. Figure 1 shows an example of bivariate height and weight data obtained from a large study of n = 27791 boys.

98418254-96589.jpg

With bivariate data, one can easily study all univariate properties of and by taking just one variable at a time, but by considering two variables simultaneously, the statistician can investigate whether one variable tends to be related to the other. Figure 1 demonstrates that height and weight of boys are not independent: The taller a boy is, the heavier he tends to be (with stronger relationship in the lower heights/weights).

While the previous observations are quite informal, based on visual impressions, there are formalized statistical tools for bivariate data which allow correlation and dependence to be investigated in an objective way. Using statistical tools, one can also examine the conditional distribution, or conditional relative frequency, of one variable, given the value of the other. For instance, in Figure 1, questions can be posed about statistical behavior (distribution) of weights (in kilograms) for all boys having height between 100 and 101 (centimeters). Commonly, investigators are interested in how mean and variance of the conditional distribution changes with the condition.

Similarly, a binary indicator of whether a given day was rainy or not, together with an indicator of whether the preceding day was rainy, can be used to examine weather. These data can be summarized in a simple table:

98418254-96592.jpg

Bivariate data may also have one variable that is binary and the other is a continuous measurement, for example, the sex and height of each person examined. The information about the presence/absence (and possibly also about the nature of) the mutual relationship obtained from bivariate data is often even more important than knowing properties of each of the variables separately, as when we want to predict one (unobserved) variable from the other (observed) variable.

The concept of bivariate random variable (and bivariate data) can be easily generalized to multivariate data (with 2 or more random variables being observed simultaneously). With increasing numbers of variables involved, there are more and more possibilities to think about. For example, with trivariate data, the statistician can consider conditional distribution of one variable, given the observed values of the other two.

Bibliography

"Bivariate Data and Analysis: Anthropological Studies." PBS Learning Media. PBS & WGBH, 2015. Web. 1 Apr. 2015.

Blitzstein, J. K., and J. Hwant. Introduction to Probability. Boca Raton: CRC, 2015. Print.

Johnson, R. A., and D. W. Wichern. Applied Multivariate Statistical Analysis. 6th ed. Upper Saddle River: Pearson, 2008. Print.

Johnson, Richard A., and Gouri K. Bhattacharyya. Statistics: Principles and Methods. 7th ed. Hoboken: Wiley, 2014. Print.

Tabachnick, Barbara G., and Linda S. Fidell. Using Multivariate Statistics. 6th ed. Boston: Pearson, 2012. Print.