Sample Survey Design

Survey research is a commonly used technique to collect the data that organizational decision makers need to help design and market products and services. In this approach, a questionnaire is administered to members of a representative sample of potential customers or other persons of interest with the assumption that the responses of the sample will represent the hypothetical responses of the underlying population. Samples can be chosen in a number of ways including random sampling, systematic sampling, convenience sampling, stratified random sampling, and cluster sampling. In any of these sampling techniques, however, it is important not to introduce bias into the sample. There are a number of ways in which survey information can be collected: In-person interviews, mail surveys, telephone surveys, and Internet surveys. No matter the way in which the data are collected, however, the design of a meaningful survey requires attention to a number of items.

Keywords Bias; Data; Distribution; Hypothesis; Normal Distribution; Population; Probability; Reliability; Sample; Validity; Variable

Overview

To be successful, businesses cannot operate in a vacuum. On the one end of the chain, businesses need to work with suppliers, partners, and government agencies to make sure that they are able to offer their widgets to the marketplace in a timely manner and at a reasonable quality and cost. On the other end of the chain, businesses also need to work with the consumer. No matter how well a widget works or how inexpensive it is, unless it meets the needs of some segment of the market, it is unlikely that it will be a success. Engineers, designers, and marketers all need to know what potential consumers want as well as what they don't want so that they can design, produce and market widgets that will sell. As the failure of Ford's Edsel in the mid-20th century illustrates, no matter how innovative or well-designed a product is, if it is not well-received by the public, it will not be a commercial success. This is more so true than ever in today's global economy. For example, a marketing campaign that plays well in one culture can not only be ineffective in another, but actually be offensive and hurt the company's reputation and sales in general.

There are many ways to collect data that can be used in decision making about how to design and market a product or service so that it will be well-received in the marketplace. These include laboratory research, field studies, and pilot introductions of the product or service in small, representative marketplaces. Another commonly used technique is survey research in which a questionnaire or interview is administered to members of a representative sample of potential customers with the assumption that the responses of the sample will represent the hypothetical responses of the underlying population.

Defining a Target

One of the first things that needs to be done when considering survey research is to operationally define the target population. Although in some cases it is of value to just randomly interview every 15th person who walks into a shopping mall, in most cases the target population needs to be better defined. For example, if the widget is a product that is aimed at preteen girls who live in the city, one will not gain good data from interviewing people in a retirement community in the suburbs.

Drawing a Sample

Once the population is operationally defined, a sample needs to be drawn from the population. A representative sample can be drawn in a number of different ways.

Random Sampling

The simplest approach is to merely randomly select people from the population (e.g., by having a computer pick names at random from a list or by selecting names from a hat) and assigning them to the sample. This approach has the advantage that it will more than likely (based on the laws of probability) be representative of the underlying population. On the other hand, achieving a truly random sample can be more difficult than it sounds. Surveys tend to have notoriously low return rates. This means that many of the people from whom one would like to collect data are taking themselves out of the sample. This self-selection means that the sample is not truly random. For example, suppose one wanted to know the reactions of teenage girls to a new widget. As an incentive, the analyst could send a dollar along with the survey as thanks for completing the questionnaire. However, if the widget is a high priced item that can only be easily purchased by the affluent, it is unlikely that this approach to data gathering would work. The dollar might be a good incentive for teenage girls from lower income families to fill out the survey, but they are unlikely to be able to afford the widget even with the extra dollar. Those who can afford to buy the widget, on the other hand, are unlikely to find an extra dollar in their pockets to be much of an incentive to complete the questionnaire. Even if one is conducting in-person interviews, samples can easily self-select. Certainly, people can decline to participate. However, depending on where the data are being collected, extraneous variables such as time of day can affect the composition of the sample. For example, if one what's to know consumer's opinions about a new gizmo, one could pass out samples and collect feedback at the local mall. However, if the data were collected during the day during the work week, the probability of getting working adults as part of the sample (or even school aged children if it were during the school year) would be greatly diminished. As a result, even if the participants in the study were randomly chosen, the sample (e.g., people in the mall at 2:00 p.m. on Tuesday) would not necessarily represent the population of all shoppers who go to that mall (let alone other malls). Further, it would be difficult to randomly pick who would participate in the study.

Systematic Sampling

Another way to select samples is through systematic sampling. In this approach, the researcher could select every nth person who walks in the door of the mall to participate in the survey. It is easier to select the participants in this scenario, but it still may not be a truly random sample depending on self-selection, what door one chose, the time of day, and so forth.

Convenience Sampling

One could choose a convenience sample instead by asking whomever looks approachable, appears to be interested in the survey or the product, or otherwise is convenient to survey if s/he is willing to participate in the survey. Although this approach has the advantage of making the sample easy to choose, it is also very unlikely that a convenience sample will be truly representative of the underlying population. For example, all the participants from whom it is convenient to collect data may share one or more characteristics such as attractiveness to the person who is collecting the data, extroversion, or not being employed full time.

Stratified Random Sampling

To help ensure that the correct proportions of different demographics are included in the sample, one could use a stratified random sample. In this approach to sample selection, one a priori determines what general characteristics one wants to include in the sample (e.g., an equal number of women and men; equal numbers of children, young adults, and adults). Within each of these subgroups (i.e., strata), a sample is randomly chosen in proportion to the proportion of that strata within the population of interest. This approach helps one to gather information about specific subgroups in the population. In addition, stratified random sampling is more likely to yield an accurate representation of each group than are some other sampling techniques. However, this approach has the potential drawback of introducing bias in some instances.

Cluster Sampling

Another approach to sampling for survey research is cluster sampling. In this approach, the population is divided into non-overlapping areas (i.e., clusters) and participants are randomly selected from each. Cluster sampling differs from stratified random sampling in that the clusters are heterogeneous rather than homogeneous. Cluster sampling offers several advantages. Not only does clustering make data more convenient to obtain (by restricting the areas from which the data are collected), but it also tends to make the data less expensive to obtain (due to reduced travel costs, etc.). However, cluster sampling also has disadvantages. If the elements of the clusters are similar, cluster sampling may be statistically less efficient than random sampling. Further, if the elements in the clusters are the same, cluster sampling is no better than sampling a single unit from the cluster.

Avoiding Bias in Sampling

When selecting a sample, it is important not to introduce bias into the sample. Statistically, bias is the tendency for a given experimental design or implementation to unintentionally skew the results of the experiment. Selection bias occurs when the sample is selected in a way that is not representative of the underlying population. For example, in the illustration above concerning asking people at the mall in the middle of a weekday to participate could bias the results of the study unfairly in the direction of the opinions of people who for whatever reason have the ability to be at the mall during the day. Another way that bias can be introduced into a sample is through self-selection. This occurs when members of the sample refuse to participate in the survey. This problem is frequently encountered when trying to collect data by mail. Participants are free to complete the survey or not; in the great majority of the cases they do not. As a result, the self-selected sample chosen may very likely be biased.

Analysis of Survey Data

Data from surveys can be analyzed in a number of ways depending on how the survey instrument was designed. The analysis technique needs to be determined before hand so that all necessary data can be collected in the survey. However, although inferential statistical techniques are available for analyzing survey data, it must be remembered that frequently the data being collected are not truly interval- or ratio-level data. In such cases, nonparametric statistics can often be used for data analysis. This is a class of statistical procedures that is used in situations when it is not possible to estimate or test the values of the parameters (e.g., mean, standard deviation) of the distribution or where the shape of the underlying distribution is unknown. Nonparametric statistics are also available that can be used to analyze nominal- and ordinal-level data.

Applications

Collection of Survey Data

There are a number of ways in which survey information can be collected: In-person interviews, mail surveys, telephone surveys, and Internet surveys. No matter the way in which the data are collected, however, the design of a meaningful survey requires attention to a number of items. Surveys potentially allow the collection of great amounts of data. However, it needs to be borne in mind that the longer the survey, the less likely people are to respond to it and the less likely they are to respond thoughtfully and truthfully (particularly for the later questions). Therefore, before getting into the nuts and bolts of developing a survey, one must first ask several questions. First, one must determine what the purpose of the survey is. If the purpose of the survey is to ask about people's reactions to a new advertising campaign for a widget, the survey is best designed to gather information only about that topic and not about other, marginally related information about which the organization might also be interested. In addition, the design of the survey needs to take into account the potential actions that the organization is going to take as a result of the feedback that it receives from the survey. In most cases, it is virtually impossible to go back and ask follow-up questions to a survey: The questions need to be asked unambiguously the first time.

The data needed by organizational decision makers can often be gathered in more than one way. Although sometimes a survey is the best way to do this, this is not necessarily a good assumption. If one wants to know, for example, what kind of people buy Super Cola, a truer picture might be gotten from observation of shoppers in the store or from data mining from the database containing buyer preferences obtained by the store's preferred customer card. If it is determined that a survey is the best method to collect the data needed, one should next develop a list of objectives for the survey. This list is used as the basis for the development of the survey questions themselves. The questions should be unambiguously stated and grammatically correct. The questions should be reviewed by others to determine if they can be misinterpreted or if they are truly asking what needs to be asked. In addition to the actual questions for the survey, a descriptive title and introduction to the survey need to be written. The introduction should include the purpose of the survey, how the data will be used, importance of accurate responses, confirmation of anonymity of the data (if, indeed, they will be anonymous), and instructions for how to fill out the survey.

Validity of Data Collection

When developing a survey, the author needs to strive to develop an instrument that is both valid and reliable. In psychometric terms, validity of a data collection instrument means that the instrument is measuring what it purports to measure. Writing specific objectives for the survey before writing the questions themselves will help the developer to write a valid survey instrument. In addition, paying attention to the wording of the questions by writing items that are clearly worded, written to an appropriate grade level, ask only one concept per question, and are grammatically correct will also help increase the validity of the instrument by decreasing the probability that the questions might be misunderstood. Doing these things will also help ensure the reliability of the instrument, or the consistency with which the instrument measures attitudes or other judgments.

Disadvantages of Survey Research

Survey research has obvious advantages. In particular, through survey research one has the potential to reach a much larger sample than could be done through more rigorous experimental methods. However, survey research is not without its drawbacks.

  • First of all, one has no control over survey research: One cannot force a person to respond to a questionnaire. In fact, the return rate for most surveys is very low. This means that no matter how carefully the original sample was selected, the final sample is self-selected and includes only those people who decided to respond. In some cases this may not matter. However, in other cases, this may mean that the data are skewed and the results do not truly represent the underlying population.
  • Second, it is very difficult to make a verbal scale on a survey to be meaningfully quantitative. For example, on a 100 point scale is there truly a difference between a score of 22 and a score of 23? Even if the scale is smaller and more manageable (e.g., between five and ten points), it is difficult to meaningfully operationally define the various points on the scale. What Harvey means by "like the design somewhat" and what Mathilde means by the same statement may be very different things.
  • Third, rating errors can easily skew results and make meaningful interpretation of the results difficult. For example, Harvey may think that no product is perfect and therefore never give it the highest mark for any question even if he thinks that the new widget is the best thing he has ever seen.
  • Fourth, if the survey is administered by a researcher rather than letting the participants fill it out on their own, the interviewer effect may bias the results. An interviewer flirting with a respondent, for example, might be able to draw out much longer responses or receive more favorable responses than would one who was curt.
  • Fifth, survey research is typically done on large samples. This means that there is usually a need for more than one interviewer. It can be difficult to standardize the ways that different persons administer an interview. Although a structured interview (i.e., a form that requires the interviewer to ask only certain questions worded in a certain way and not to deviate from the format) can help with this problem, this type of interview tends to be rather difficult to administer and does not leverage the fact that the interviewer can probe for better data.
  • Finally, subjects in survey research often do not care about their answers. Poorly worded questions, overly long questionnaires, and lack of interest can lead to a poor return rate or — in some instances — outright lying on the form.

Conclusion

Survey research is a commonly used technique to collect and analyze data to help organizations design and market products and services. Although care must be exercised in selecting a representative sample and in designing a survey that will gather the data needed, this technique is an important tool for collecting and analyzing data that could not otherwise be obtained.

Terms & Concepts

Bias: The tendency for a given experimental design or implementation to unintentionally skew the results of the experiment due to a nonrandom selection of participants.

Data: ( sing. datum) In statistics, data are quantifiable observations or measurements that are used as the basis of scientific research.

Distribution: A set of numbers collected from data and their associated frequencies.

Hypothesis: An empirically-testable declaration that certain variables and their corresponding measure are related in a specific way proposed by a theory.

Normal Distribution: A continuous distribution that is symmetrical about its mean and asymptotic to the horizontal axis. The area under the normal distribution is 1. The normal distribution is actually a family of curves and describes many characteristics observable in the natural world. The normal distribution is also called the Gaussian distribution or the normal curve or errors.

Population: The entire group of subjects belonging to a certain category (e.g., all women between the ages of 18 and 27; all dry cleaning businesses; all college students).

Probability: A branch of mathematics that deals with estimating the likelihood of an event occurring. Probability is expressed as a value between 0 and 1.0, which is the mathematical expression of the number of actual occurrences to the number of possible occurrences of the event. A probability of 0 signifies that there is no chance that the event will occur and 1.0 signifies that the event is certain to occur.

Reliability: The degree to which a psychological test or assessment instrument consistently measures what it is intended to measure. An assessment instrument cannot be valid unless it is reliable.

Sample: A subset of a population. A random sample is a sample that is chosen at random from the larger population with the assumption that such samples tend to reflect the characteristics of the larger population.

Validity: The degree to which a survey or other data collection instrument measures what it purports to measure. A data collection instrument cannot be valid unless it is reliable.

Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.

Bibliography

Arsham, H. (2007). Questionnaire design and surveys sampling. Retrieved September 11, 2007, from University of Baltimore Website http://home.ubalt.edu/ntsbarsh/stat-data/Surveys.htm

Black, K. (2006). Business statistics for contemporary decision making (4th ed.). New York: John Wiley & Sons.

Brick, J. (2011). The future of survey sampling. Public Opinion Quarterly, 75, 872-888. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=69899634&site=ehost-live

Brick, J., Williams, D., & Montaquila, J. M. (2011). Address-based sampling for subpopulation surveys. Public Opinion Quarterly, 75, 409-428. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=66304328&site=ehost-live

Skalland, B. (2011). An alternative to the response rate for measuring a survey's realization of the target population. Public Opinion Quarterly, 75, 89-98. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=59233877&site=ehost-live

Suggested Reading

Chang, H.-J., Wang, C.-L., & Huang, K.-C. (2004). Simple random sample equivalent survey designs reducing undesirable units from a finite population. Statistical Papers, 45, 287-295. Retrieved September 18, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=13127505&site=bsi-live

Dalal, S. R. & Srinivasan, V. (1977). Determining sample size for pretesting comparative effectiveness of advertising copies. Management Science, 23, 1284-1294. Retrieved September 18, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=7160406&site=bsi-live

Fuller, C. H. (1974). Weighting to adjust for survey nonresponse. Public Opinion Quarterly, 38, 239. Retrieved September 18, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=5413114&site=bsi-live

Zelin, A. & Stubbs, R. (2005). Cluster sampling: a false economy? International Journal of Market Research, 47, 503-524. Retrieved September 18, 2007, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=18279063&site=bsi-live

Essay by Ruth A. Wienclaw, Ph.D.

Dr. Ruth A. Wienclaw holds a Doctorate in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human/systems integration.