Bivariate Data
Bivariate data refers to data that involves two variables, which are analyzed to understand the relationship between them. This type of data is commonly visualized using scatter plots, allowing observers to identify correlations and trends. For example, in a study examining height and weight, bivariate data can reveal how these two dimensions interact—often indicating that taller individuals tend to weigh more.
In statistical analysis, bivariate data can be approached in two ways: the marginal distribution considers each variable separately, while the joint distribution examines both variables simultaneously. This dual perspective enables statisticians to explore the dependency between variables, such as how the conditional distribution of weight might change given a specific height range.
Bivariate data can also include one binary variable alongside a continuous variable, such as sex and height. The insights gained from studying the interactions between these variables can be pivotal, particularly when predicting one variable based on another. Furthermore, bivariate concepts can extend to multivariate data, where multiple variables are analyzed together, which introduces more complex relationships between the data points.
On this Page
Subject Terms
Bivariate Data
Bivariate data is data that deals with two variables, which are often being compared in order to learn about the relationship between them. This data is often represented using a scatter plot.
If analysis is restricted to a single random variable, then we speak about univariate distribution and univariate data (observations of the random variable , for example,
). When analysis involves two random variables, for example, height and weight of a person, each can be considered separately (marginal distribution), as two univariate random variables. Alternatively, both variables can be considered simultaneously (joint distribution), as a bivariate random (vector) variable. Repeated observations of the bivariate variable (pair of random variables), for example,
, constitute bivariate data. Figure 1 shows an example of bivariate height and weight data obtained from a large study of n = 27791 boys.
With bivariate data, one can easily study all univariate properties of
and
by taking just one variable at a time, but by considering two variables simultaneously, the statistician can investigate whether one variable tends to be related to the other. Figure 1 demonstrates that height and weight of boys are not independent: The taller a boy is, the heavier he tends to be (with stronger relationship in the lower heights/weights).
While the previous observations are quite informal, based on visual impressions, there are formalized statistical tools for bivariate data which allow correlation and dependence to be investigated in an objective way. Using statistical tools, one can also examine the conditional distribution, or conditional relative frequency, of one variable, given the value of the other. For instance, in Figure 1, questions can be posed about statistical behavior (distribution) of weights (in kilograms) for all boys having height between 100 and 101 (centimeters). Commonly, investigators are interested in how mean and variance of the conditional distribution changes with the condition.
Similarly, a binary indicator of whether a given day was rainy or not, together with an indicator of whether the preceding day was rainy, can be used to examine weather. These data can be summarized in a simple table:
Bivariate data may also have one variable that is binary and the other is a continuous measurement, for example, the sex and height of each person examined. The information about the presence/absence (and possibly also about the nature of) the mutual relationship obtained from bivariate data is often even more important than knowing properties of each of the variables separately, as when we want to predict one (unobserved) variable from the other (observed) variable.
The concept of bivariate random variable (and bivariate data) can be easily generalized to multivariate data (with 2 or more random variables being observed simultaneously). With increasing numbers of variables involved, there are more and more possibilities to think about. For example, with trivariate data, the statistician can consider conditional distribution of one variable, given the observed values of the other two.
Bibliography
"Bivariate Data and Analysis: Anthropological Studies." PBS Learning Media. PBS & WGBH, 2015. Web. 1 Apr. 2015.
Blitzstein, J. K., and J. Hwant. Introduction to Probability. Boca Raton: CRC, 2015. Print.
Johnson, R. A., and D. W. Wichern. Applied Multivariate Statistical Analysis. 6th ed. Upper Saddle River: Pearson, 2008. Print.
Johnson, Richard A., and Gouri K. Bhattacharyya. Statistics: Principles and Methods. 7th ed. Hoboken: Wiley, 2014. Print.
Tabachnick, Barbara G., and Linda S. Fidell. Using Multivariate Statistics. 6th ed. Boston: Pearson, 2012. Print.