Sample surveys
Sample surveys are a statistical method for collecting data from a representative subset of a population to gain insights into its attitudes, opinions, and behaviors. Unlike a census, which aims to survey an entire population, sample surveys use a smaller group to infer information about the larger population. The history of surveys dates back to ancient civilizations, with significant developments occurring in the early 20th century, particularly through the efforts of pioneers like George Gallup, who introduced more scientifically valid sampling techniques. These methods have evolved over time, adapting to new technologies such as telephone and internet surveys, which have become prevalent in recent years.
Sample surveys play a crucial role in various fields, including political polling, market research, and social science, guiding decision-making and policy formulation. However, challenges such as response and nonresponse biases can affect the accuracy of survey results. Efforts to minimize these biases and ensure random sampling are ongoing areas of research. Additionally, institutions like the U.S. Census Bureau employ sampling techniques to gather demographic information, impacting federal funding and representation. Overall, sample surveys remain a vital tool for understanding societal trends and shaping public opinion.
Subject Terms
Sample surveys
Summary: Mathematicians and statisticians help design sampling methods and techniques to better represent populations and account for biases and missing data.
A survey is a statistical process by which data are collected from a representative sample of some population of interest in order to determine the attitudes, opinions, or other facts about that population. A census is the special case where everyone in the population is surveyed.
![Jerzy Neyman, 1973. By Konrad Jacobs, Erlangen, Copyright is MFO [CC BY-SA 2.0 de (http://creativecommons.org/licenses/by-sa/2.0/de/deed.en)], via Wikimedia Commons 94982041-119261.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/94982041-119261.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
![George Horace Gallup, founder of the Gallup polls. By GeorgeGallup.gif: Uncertain derivative work: Quibik (GeorgeGallup.gif) [Public domain], via Wikimedia Commons 94982041-119260.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/94982041-119260.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)
For example, the Babylonians are known to have taken a population census around 3800 b.c.e. In one of the first modern surveys, the Harrisburg Pennsylvanian newspaper polled city residents about the 1824 presidential election. Polling continued to be largely a local phenomenon until a 1916 national survey by Literary Digest magazine, which predicted the winners of several presidential elections despite using highly unscientific survey methods. Their famously incorrect assertion that Alf Landon would beat Franklin Roosevelt in the 1936 election is cited as contributing to the magazine’s failure. Journalist and market researcher George Gallup, who correctly predicted Roosevelt’s 1936 victory, was a pioneer in statistical sampling in the early twentieth century, though at the time, many considered his ideas quite radical. A post–World War II boom in manufacturing led companies to survey consumers to tailor products to preferences and increase sales. In the twenty-first century, public opinion polls on all aspects of society are pervasive and surveys frequently shape society’s opinions and actions in addition to simply measuring them.
Students begin learning how to collect survey data in the primary grades. Researchers in many disciplines also routinely rely on data gathered via surveys. Mathematicians and statisticians work on mathematically valid methods for selecting samples that are random and representative as well as methods to reduce bias in surveys, effectively analyze data, present results that adjust for random error, and account for the effects of missing data. Many of these individuals belong to the Survey Research Methods Section of the American Statistical Association. Leslie Kish, a recipient of the association’s prestigious Samuel S. Wilks Award, was especially cited for his worldwide influence on sample survey practice and for being “a humanitarian and true citizen of the world… [whose] concern for those living in less fortunate circumstances and his use of the statistical profession to help is an inspiration for all statisticians.”
History of Surveys
In practice, surveys are collections of questions administered to individuals. Organizations like Gallup (founded as the American Institute of Public Opinion in 1935) specialize in conducting scientifically valid surveys. In the early part of the twentieth century, surveys were mostly conducted door-to-door by trained surveyors, a procedure used by both Gallup and the U.S. Census. Frequently, surveyors used the mail, like in the case of Literary Digest. Telephone surveys increased notably in the 1960s, which was attributed in large part to the fact that the costs of in-person research were escalating and trends in non-response suggested that people were growing less willing to answer face-to-face surveys, which diminished their prior advantage over phone surveys. Around 1970, statisticians Warren Mitofsky and Joseph Waksberg developed an efficient method of random digit dialing that revolutionized telephone survey research. However, some major organizations, like Gallup, continued door-to-door surveys into the mid-1980s, at which point they determined that a statistically sufficient proportion of U.S. homes had at least one telephone.
In 2008, Gallup notably expanded its methodology to include cell phones, since an increasing proportion of people no longer use landlines. In the twenty-first century, surveys are increasingly conducted via the Internet, though the U.S. Census still uses a combination of mail and house-to-house surveys. Harris Interactive, which went public in 1999, is a company that specializes in interactive online polls like the Harris Interactive College Football Poll, which ranks the top 25 Bowl Conference Series football teams each week.
Bias
Each survey method has different implications for both response bias and nonresponse bias. It is unclear when mathematicians and pollsters first began to recognize the negative influences of these biases, though adjustments were made in the latter half of the twentieth century. Systematic investigations can perhaps be traced to the mid-twentieth century, coincident with similar concerns in experimental design, like the placebo effect and psychologist Henry Landsberger’s naming of the Hawthorne effect. Overall, these biases are problematic because they are non-random and cannot be accounted for by most traditional statistical methods. As a result, they may produce misleading results. Methods to combat these biases are the subject of a great deal of ongoing research and are typically addressed via incentives and proactive planning rather than adjustments after the fact.
Sampling
Randomness is a critical component of survey methodology. Statistical techniques commonly assume that the sample is a random subset of the population. When this is true, the results are more likely to be representative and informative of the population. Though random sampling is the standard in modern scientific polling, early pollsters like Gallup tended to use convenience or quote sampling—taking a sample of whomever was accessible or convenient, sometimes grouped according to other influential variables like political party, gender, or neighborhood. In some cases, this was simply an issue of practicality in terms of time and financial resources. Mathematical statistician Jerzy Neyman is credited with presenting the first developed notion regarding making inferences from random samples drawn from finite populations, what is now called “probability sampling,” at a professional conference in 1934. He also contrasted probability sampling with non-random methods. The U.S. Department of Agriculture, in partnership with the statistical laboratory at Iowa State University, began researching probability sampling methods in the late 1930s, as did the U.S. Census Bureau. One of these influential survey researchers was William Cochran, who also helped build many academic statistics programs, including at Harvard. Through the 1940s and beyond, the formal methods of probability sampling and analysis sampling were developed, implemented, and refined in a wide variety of situations.
In the late 1970s and beyond, some researchers’ attention turned to more advanced concepts like model-dependent sampling. In probability sampling, the characteristics of the population are wholly inferred from the sample. Model-dependent sampling, in contrast, assumes some probability model for the population beforehand and designs both a sampling and an analysis plan around this model. This method allows the researchers conducting the survey to optimally match the statistical properties of chosen estimators to the population. Statisticians Morris Hansen, William Madow, and Benjamin Tepping discussed many of the principal advantages and limitations of this method in a 1978 presentation and 1983 publication. Morris Hansen was an internationally known expert on survey research, an associate director for research and development at the Census Bureau, and later chairman of the board for polling company Westat, Inc. He also served as president of the American Statistical Association and Institute for Mathematical Statistics.
U.S. Census
Though the U.S. Constitution calls for a count of the population in the decennial census, the U.S. Census Bureau conducts other types of surveys and has been using sampling since 1937. In 1940, the bureau began asking a random sample of people counted in the decennial census extra questions to allow better characterization of population demographics as well as to estimate coverage errors. The ongoing American Community Survey helps determine how billions of federal and state dollars are distributed each year. In the late twentieth century, in large part because of substantial difficulties during the 1990 census, many statisticians proposed completely substituting sampling methods for the decennial counting process or at least substantially increasing the role of sampling. They felt that issues like undercoverage of certain subpopulations could be better addressed with increasingly sophisticated statistical methods. Cost was also considered. They had the support of many cities, states, civil rights groups, and members of Congress. The proposal was opposed by many other politicians and segments of the general population for both political reasons and because of skepticism regarding the sampling process. It ultimately required a ruling by the U.S. Supreme Court, which allowed supplemental sampling for some purposes but required a count to determine congressional apportionment.
Bibliography
Brick, J. Michael, and Clyde Tucker. “Mitofsky–Waksberg: Learning From the Past.” Public Opinion Quarterly 71, no. 5 (2007). http://poq.oxfordjournals.org/content/71/5/703.full#ref-24.
Hansen, Morris. “Some History and Reminiscences on Survey Sampling.” Statistical Science 2, no. 2 (1987). http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf‗1&handle=euclid.ss/1177013352.