Sampling
Sampling is a crucial research technique used to gather data from a subset of a larger population when it is impractical to collect information from every individual. Researchers, especially sociologists, rely on samples to test theories and understand behaviors within society. The effectiveness of sampling hinges on minimizing bias during the selection process, ensuring that the sample accurately reflects the broader population. Various sampling methods exist, including random sampling, stratified sampling, systematic sampling, and convenience sampling, each with its advantages and limitations.
Random sampling involves selecting individuals randomly to represent a population, while stratified sampling ensures that specific demographic segments are proportionately represented. However, challenges such as low response rates or self-selection can introduce bias, which compromises the sample's validity. Additionally, methods like quota sampling and cluster sampling provide alternative approaches but may also carry their own risks of bias. Ultimately, careful choice and implementation of sampling techniques are essential for drawing meaningful conclusions from research and gaining a better understanding of societal behaviors.
On this Page
Subject Terms
Sampling
It is frequently impossible to gather data from every member of a population of interest. Therefore, sociologists and other researchers typically base their studies on samples of individuals that are drawn from the population of interest. To be useful for research purposes, these samples need to be drawn in such a way as to minimize the probability of introducing bias into the selection process so that the resulting sample truly represents the underlying population. There are several ways to do this, however, the comparative efficacy of various sampling approaches remains a matter of debate.
Sampling
Overview
Why Do Researchers Take Samples?
To better understand the behavior of people within society, sociologists develop theories and collect and analyze data to test the validity of those theories. In some situations, it is relatively easy to gather data about opinions, behavior, or other characteristics of interest from every member of a population. For example, if a professor wishes to know whether their class prefers to write one ten-page paper or two five-page papers, they can simply ask their students for their preferences and make the assignment based on the majority opinion. This is easy to do because the class size is relatively small. The professor can easily collect the data and, since their motivation to respond is relatively high, the students are likely to participate in the survey, giving the professor the data they need to make their decision.
If, on the other hand, the professor wants to determine the same preference for all students in the university, or all students in all universities across the country or across the globe, the activity becomes more complicated. First, the sheer number of students in those larger populations makes the task of collecting information costly and time-consuming. Second, although the students in the class may be motivated to answer the professor’s question because they are directly affected by the outcome, the students in these greater populations have no such motivation. As a result, the probability of collecting data from them all is rather low. However, the professor cannot in good conscience extrapolate the answer of their class to university students in general, students taking all of the professor’s courses, or even university students taking this particular course from other professors at the university. There are too many differences between the professor’s class and those other, larger groups to make an accurate extrapolation. Although the members of the professor’s class have characteristics in common with the members of the other, larger groups (e.g., general age range, education level), it cannot necessarily be reasonably assumed that there are no other variables (e.g., workload, expectations) that may affect the responses of members of the other groups, making their answers different from those of this particular class.
To develop theories and build knowledge of human behavior in society, however, it is often necessary to collect data about groups of people too large to poll individually. Rather than collecting data from a manageable group that has a low likelihood of representing the population that a researcher wishes to test, they instead take a sample of individuals from the larger group using a methodology that they believe will allow them to draw a sample that reflects the characteristics of the larger population. For example, although it may be impossible to collect data from every university student across the country, it may be possible to gather data from a representative sample of the population (e.g., university students taking introductory sociology courses in several universities that have the characteristics in which the researcher is interested). The sample that is selected can then be used in research based on the assumption that the sample has the same characteristics as the population as a whole.
Selection Bias
It is very important that the method used to draw the sample gives researchers a sample that is representative of the characteristics of the population in which the researcher is interested. Otherwise, the sample will be biased, and the results of the study will not represent the results that would have been obtained from the population in general.
Selection bias occurs when the sample asked to participate in a study is selected in a way that is not representative of the underlying population. One of the classic examples of a biased sample that led to an erroneous conclusion is the Gallup Poll result following the 1948 presidential election which predicted that Thomas Dewey would beat Harry Truman. Results obtained from biased samples cannot be meaningfully extrapolated to the population at large.
Applications
Defining Sample Characteristics
Before determining the best way to draw a sample, the researcher must first operationally define which characteristics are important in the target population. Although in some cases it is of value to just randomly interview every fifteenth person who walks into a shopping mall, in most cases the target population needs to be better defined. For example, if the researcher is interested in the opinions of shoppers on whether they would play the latest video game, it would be better to draw the sample from those shoppers who are more likely to play the game than from shoppers in general.
Sampling Methods
Once the population in which one is interested has been operationally defined, a sample needs to be drawn from the population. There are two general approaches used to select a representative sample. The first is random sampling, in which a subset of the population is randomly chosen for the sample. Choosing names out of a hat or using a random number generator or a list of names are examples of this approach. However, although this is a widely used technique and may in many cases accurately represent the larger population because it is based on random probability, it also may be skewed to unfairly represent some characteristic. As a result, in some situations it is important to use a stratified random sample. This technique takes into account the known characteristics of the population. For example, if it is known that half the sociology students in the country are women, a researcher might randomly select 100 women and 100 men for a research study so that both genders are represented in the sample in the same proportion that they appear in the population.
Representative samples can be drawn in a variety of ways. The simplest approach to sampling is to merely randomly select people from the population through such methods as having a computer pick names at random from a list or by selecting names from a hat. These randomly chosen individuals are then assigned to the sample. Based on the laws of probability, this approach will more than likely be representative of the underlying population. However, in practice, achieving a truly random sample can be more difficult than it sounds. Written surveys, for example, tend to have notoriously low return rates, and people are frequently loath to give out information over the phone. As a result, many of the people from whom one would like to collect data take themselves out of the sample. This self-selection means that the resultant sample is not truly random. Further, the characteristics that are common to the individuals who opt out of participating in the research may be less frequently observed in the rest of the sample. This means that the sample may not represent a significant segment of the underlying population.
Another way to select samples is through systematic sampling, which determines who will be included or excluded from the sample on the basis of an a priori rule. For example, the researcher could select every nth person who walks in the door of a mall to participate in the survey. Although it is easier to select the participants using this approach, it still may not be a truly random sample depending on the self-selection that occurs through factors like what door or time of day one chooses. Another approach would be to choose a convenience sample by asking whoever looks approachable, appears to be interested in the survey, or in some other way is most convenient to survey if he or she is willing to participate in the survey. Although this approach has the advantage of making the sample easy to choose, it is also very unlikely that a convenience sample will be truly representative of the underlying population. All the participants from whom it is convenient to collect data may share one or more characteristics such as attractiveness to the person who is collecting the data, extroversion, etc.
One approach to trying to ensure that the correct proportions of different demographics are included in the sample is through the selection of a stratified random sample. In this approach, one a priori determines what general characteristics one wants to include in the sample (e.g., an equal number of women and men; equal numbers of children, young adults, and adults). Within each of these subgroups (also called "strata") a sample is randomly chosen in proportion to the proportion of that stratum within the population of interest. Stratified random sampling helps one gather information about specific subgroups in the population. This approach is also more likely to yield an accurate representation of each group than are some other sampling techniques. However, stratified random sampling may also introduce bias into the selection process.
Cluster sampling is another approach to sampling that is often used in sociological research. In cluster sampling, the population is divided into non-overlapping areas (i.e., clusters) and participants are randomly selected from each area. In cluster sampling as opposed to stratified random sampling, the clusters are heterogeneous rather than homogeneous. There are several advantages to cluster sampling. First, clustering makes data more convenient to obtain by restricting the areas from which the data are collected. In addition, it also tends to make the data more economical to obtain by reducing expenses like travel costs related to data collection. However, if the elements of the clusters are similar, cluster sampling may be statistically less efficient than random sampling. In addition, if the elements in the clusters are the same, cluster sampling is no better than sampling a single unit from the cluster.
As noted above, it is important to use appropriate sampling methods to avoid introducing bias and obtain a truly representative sample from which one can extrapolate conclusions to a larger population. Statistically, bias is defined as the tendency for a given experimental design or implementation to unintentionally skew the results of the research. Selection bias occurs when the sample is selected in a way that introduces error and causes the resultant sample to not be representative of the underlying population. For example, if it was known that school-aged children were most likely to play a video game, trying to draw a sample from shoppers at a mall during school hours in the middle of the week would be unlikely to result in a representative sample.
Viewpoints
Which Sampling Methods Are the Best?
Although these approaches to sampling attempt to increase the probability that a random sample will be drawn, no method of sampling is perfect or without its drawbacks. Sometimes the choice of sampling method is limited by practical factors. In other cases, however, multiple methods may be feasible. To maximize the probability that the results of research done with a sample will be, in fact, representative of the underlying population, the choice of sampling method needs to be based on careful consideration rather than expedience. However, which method is best is a topic that has been heatedly debated for decades.
Area Sampling vs. Quota Sampling
Two approaches to sampling that are frequently used are area sampling and quota sampling. Area sampling is a type of multistage sampling that uses maps. Quota sampling is a type of stratified sampling in which the selection of the strata within the sample is not random but is rather typically left to the discretion of the interviewer. Although some theorists and practitioners believe that quota samples are so innately prone to bias as to be completely worthless, others believe that this technique can be appropriate in some situations or even—with the implementation of adequate safeguards—be made highly reliable.
One of the main reasons that quota sampling continues to be used is that it is significantly less expensive than other methods, often costing only a third or half as much as random sampling techniques. In addition, quota sampling is administratively much simpler to use than other methods because there is no need to randomly select sample members or continue to attempt to contact specific sample members who were unavailable during the first sampling attempt. Further, in some cases quota sampling is the most practical approach to sampling, such as with cases in which one needs to obtain immediate public reaction to an event. In these cases, the delay associated with determining a more random sample would taint the data through the introduction of memory errors. On the other hand, there are several drawbacks to the use of quota sampling techniques. Quota samples do not allow the researcher to estimate sampling errors because of their lack of randomness. This also means that potential sampling errors cannot be controlled. In addition, the interviewer using quota sampling may not obtain a representative sample because of experimenter error or bias, lack of opportunity for a more representative sample, or some other cause. Further, quota sampling is heavily dependent on the judgment of the interviewer and, therefore, more open to bias than random techniques.
Hockstim and Smith conducted a series of experiments to compare the relative efficacy of these two methods of sampling. The first experiment was designed to answer the question of how the composition of quota-control samples and stratified block samples differ. Two interviewers in each of eleven cities with populations over 50,000 were given the same number of ballots. One interviewer was given a simple quota assignment (i.e., gender, age, socioeconomic status) and the other interviewer was given a block assignment stratified by census tract and average monthly rent with superimposed quotas for gender and age. Blocks within each stratum were systematically selected. Interviewers were to select participants from lower, middle-, and upper-income households in the same proportions as they occurred within the blocks. The experiment was then repeated with a reversal of the quota and block assignments for the two interviewers within each pair. In both surveys, the block sample showed less bias on the education variable, although both sampling techniques resulted in bias on this variable. The block sample was somewhat superior to the quota sample for the variable of average rent, although both samples were comparable on the other three variables examined.
The second experiment examined the question of the effects of restriction on the interviewer's freedom in sampling with a block. In this experiment, a block sample was compared with a sample in which both blocks and dwelling units within the blocks were predetermined systematically to minimize bias resulting from the interviewers' choices. In this study, it was found that the less freedom an interviewer had in selecting dwelling units, the more representative the resultant sample was.
The third experiment examined the results of controlling the selection of respondents within households and requiring callbacks if the respondents did not answer. Samples were drawn from an area that contained a city of over 200,000 people, numerous small towns with populations between 2,500 and 25,000 people, and rural areas with villages, farms, and open country. In this experiment, there was very close agreement between the domal and area samples.
The authors drew three major conclusions from their research. First, they found that area samples yielded more representative cross sections of the population than did quota samples. Second, the use of mechanical or automatic selection tools to choose sample participants tends to make sample selection more representative and less subject to bias than does selection based on human opinion. Third, it was found that the requirement for callbacks for households that did not respond the first time was not always necessary. The researchers concluded that in certain circumstances carefully selected quota samples yield cross sections equivalent to those of area samples. The authors concluded that choice of sample method should be made with full consideration of the demands of the survey requirements.
Conclusion
Sampling is a group of techniques that are used to select a sample from a larger population so that research can be done with a manageable group and extrapolated to the larger population. There are many approaches to sampling. However, it is important that the sampling technique used results in a sample that represents the larger population and does not systematically introduce bias. Carefully chosen, a sample can help a sociologist draw meaningful results from data and better understand the behavior of people within society.
Terms & Concepts
Bias: The tendency for a given experimental design or implementation to unintentionally skew the results of the experiment due to a nonrandom selection of participants.
Data: (sing. datum) In statistics, data are quantifiable observations or measurements that are used as the basis of scientific research.
Population: The entire group of subjects belonging to a certain category (e.g., all women between the ages of eighteen and twenty-seven; all-dry cleaning businesses; all college students).
Probability: A branch of mathematics that deals with estimating the likelihood of an event occurring. Probability is expressed as a value between 0 and 1.0, which is the mathematical expression of the number of actual occurrences to the number of possible occurrences of the event. A probability of 0 signifies that there is no chance that the event will occur and a probability of 1.0 signifies that the event is certain to occur.
Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population.
Sampling: A group of techniques that are used to select a sample from a larger population so that research can be done with a manageable group and extrapolated to the larger population.
Sampling Error: An error that occurs in statistical analysis when the sample does not represent the population.
Skewed: A distribution that is not symmetrical around the mean (i.e., there are more data points on one side of the mean than there are on the other).
Validity: The degree to which a survey or other data collection instrument measures what it purports to measure. A data collection instrument cannot be valid unless it is reliable.
Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated to determine their effect on the dependent variables, also known as response variables. Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.
Bibliography
Arsham, H. (2008). "Questionnaire design and surveys sampling." University of Baltimore. Retrieved September 11, 2007, from
Bandalos D. L. (2018). Measurement theory and applications for the social sciences. Guilford Press.
Black, K. (2023). Business statistics for contemporary decision making (11th ed.). John Wiley & Sons.
Crawford, I. M. (1997). Marketing research and information systems. Food and Agriculture Organization of the United Nations. Retrieved from
Hochstim, J. R. & Smith, D. M. K. (1948, Spr). Area sampling or quota control? - Three sampling experiments. Public Opinion Quarterly, 12, 73-80. Retrieved March 14, 2008, from EBSCO Online Database SocINDEX with Full Text.
Mouw, T., & Verdery, A.M. (2012). Network sampling with memory: A proposal for more efficient sampling from social networks. Sociological Methodology, 42, 206–256. Retrieved May 30, 2023, from EBSCO online database, SocINDEX with Full Text.
Vearey, J. (2013). Sampling in an urban environment: Overcoming complexities and capturing differences. Journal of Refugee Studies, 26, 155–162. Retrieved May 30, 2023, from EBSCO online database, SocINDEX with Full Text.
Suggested Reading
Anderson, M. L. & Taylor, H. F. (2008). Sociology: Understanding a diverse society (4th ed.). Wadsworth/Thomson Learning.
Chambers, R.L., & Clark, R.G. (2012). An introduction to model-based survey sampling with applications. Oxford University Press.
Levy, P.S., & Lemeshow, S. (2013). Sampling of populations: Methods and applications (4th ed.). John Wiley & Sons.
McCormick, T. C. (1938, Oct). The role of statistics in social research: An elementary interpretation. Social Forces, 17, 47-51. Retrieved March 14, 2008, from EBSCO Online Database SocINDEX with Full Text.
Schaefer, R. T. (2022). Sociology: A brief introduction (14th ed.). McGraw-Hill.
Stockard, J. (2000). Sociology: Discovering society (2nd ed.). Wadsworth/Thomson Learning.