Analysis of Secondary Data

Due to various ethical and logistical considerations, it can be impossible in some settings to gather primary data for research analysis. However, many sources of secondary data are available that can be further analyzed by researchers seeking to answer other research questions. Secondary analysis may be qualitative or quantitative in nature and may be used by itself or combined with other research data to reach conclusions. Although the use of secondary data can be more cost-effective than the use of primary data, the fact that the researcher has no control over how the data were collected means that there are several disadvantages as well. However, a well-designed meta analysis or other study that incorporates secondary data can be very useful to the researcher in answering questions about social issues and significantly aid in the advancement of the social sciences.

Keywords Experiment; Inferential Statistics; Inter-Interviewer Reliability; Interviewer Bias; Interviewer Effects; Meta Analysis; Qualitative Research; Quantitative Research; Secondary Analysis; Subject; Survey; Variable

Overview

Acquiring data for research in the social and behavioral sciences can be a difficult process necessitating the application of great creativity. Sometimes, ethical considerations mean that it is impossible to experimentally manipulate variables. For example, when studying the detrimental effects of length of unemployment, one cannot in good conscience randomly decide which subjects will lose their jobs and which ones will not or for how long they will be without income. In other cases, the mere fact that a researcher is observing the subjects changes the way that the subjects act. For example, the Hawthorne Effect refers to a well-known study of the effects of lighting levels on assembly line employees at the Hawthorne works of Western Electric outside Chicago. Researchers found that productivity increased not only when lighting levels were increased, but also when they were decreased because of the subjects' expectations that the experimental interventions would enable them to increase productivity. In still other cases, it is simply not possible to gather the data needed for a research study for practical or logistical reasons. For example, to test the effectiveness of a new training program for aircraft maintenance personnel, one could easily design a controlled study to see whether personnel performed better after training or without training. It would be relatively simple to operationally define dependent variables for the study including number of fatal crashes. However, it is highly unlikely that any airline would be willing to risk the lives of their employees or customers to collect such data.

Fortunately, researchers are not restricted to the use of primary data (i.e., data that are collected specifically for the research study). Many types of secondary data that have been collected and analyzed for other purposes are often available for re-analysis. In secondary analysis, further analysis of existing data (typically collected by a different researcher) is conducted. The intent of secondary analysis is to use existing data in order to develop conclusions or knowledge in addition to or different from those resulting from the original analysis of the data. Secondary analysis may be qualitative or quantitative in nature and may be used by itself or combined with other research data to reach conclusions.

Sources of Secondary Data

Secondary data are available from many sources. In some cases, one must contact the researchers of previous studies and gain access to their data. In other cases, it may be possible to use public access data.

• veteran's issues, and women's issues. The Census Bureau can be accessed at www.census.gov.

• University of Minnesota's Minnesota population Center is an integrated series of census microdata samples for US and international population studies. The data are intended for use by economists and social scientists. The data date back to the 1960s and includes 80 samples from 26 countries, with more scheduled for release in the future. The IPUMS data can be accessed at www.ipumns.umn.edu. The Bureau of Labor Statistics collects and maintains data on employment, earnings, living conditions, productivity, and other factors of interest to social scientists. The portal for the Bureaus of Labor Statistics data is found at http://stats.bls.gov.

• The Inter-University Consortium for Political and Social Research (ICPSR) maintains the world's largest archive of digital social science data. The goals of the consortium are to acquire and preserve social science data, provide open and equitable access to these data, and promote their effective use. The ICPSR web site is found at www.icpsr.umich.edu.

In addition to these sources, secondary data can be obtained for analysis through a wide variety of sources including newspaper and periodicals, organizational records and archives, videotapes of motion pictures and television programs, web pages, scientific records (e.g., patent applications), speeches of public figures, votes cast in elections or by legislators, as well as personal journals, diaries, e-mail, and correspondence. Many other sources of secondary data are available depending on the needs of the researcher.

Advantages to Using Secondary Data

There are a number of advantages to using secondary data for analysis. As discussed above, there are certain situations in which it is impossible for ethical, logistical, or other practical reasons to collect primary data. The analysis of secondary data allows researchers to examine data collected for other purposes to find the answers they seek to research questions. For example, the study of the effects of unemployment could include the reanalysis of questionnaires routinely collected by government or private employment agencies. The re-analysis of previously collected survey data could also be used in some cases to answer other questions about the effects of various levels of the independent variable on the dependent variable without the presence of the researcher or other observer changing the results. Similarly, a historical study of routinely collected data might be divided into groups for aircraft that had been worked on by technicians who had received the new training vs . those who had not. In addition, the collection of data for secondary analysis is typically much faster because the data have already been collected. Similarly, the researcher does not have to develop a new data collection instrument or run a new experiment, other factors that both reduce the time to gather the data as well as the costs associated with data collection. A major advantage of the analysis of secondary data is that the collection of such data is non-reactive. In other words, particularly for archival data, subjects will act naturally because they do not realize that their behavior is being observed and recorded. This advantage, of course, does not extend to data collected with surveys or direct observation where subjects know that their reactions are being observed.

Disadvantages of Using Secondary Data

On the other hand, the analysis of secondary data is not without its potential disadvantages as well. Unless one has collected the data oneself, it is virtually impossible to be completely confident in the quality of the data. Although the survey instruments associated with data sets may be available, one does not necessarily know what the inter-interviewer reliability is for surveys not under one's own control or whether or not interviewer bias or other interviewer effects may have tainted the data. Further, it is not always possible to find available data sets that contain the data that one needs to analyze. Another disadvantage in the use of secondary data arises from questions concerning the way subjects were selected. In most research studies, subjects are chosen from a representative sample so that results can be extrapolated to the general population. However, just as it is not always possible to know if interviewer affects were unintentionally introduced into data collection, it is similarly impossible in many cases to know whether or not a sample selected by someone else is truly random or if it was biased. When one uses primary data and research analysis, one can be confident about the way data were collected, samples were selected, and the relevance of survey items and other measurements to the research hypothesis. However, the same cannot always be said for analyses performed in secondary data. In samples where sampling error or bias occur, any conclusions drawn from the data cannot be extrapolated to the population at large.

Considerations

There are a number of issues that must be considered before embarking on a secondary analysis. First, if using secondary data collected using a survey instrument, it must be determined whether or not the wording of the question(s) of interest on the survey are a good fit for the data being used in the current analysis. If the wording is ambiguous or otherwise questionable for use in the current study, a better source of data needs to be found. When the results of a secondary analysis are reported, it is important to also consider the experimental conditions under which the data were originally collected. These conditions may impact the usefulness of the data for the current study. It can be tempting to "make do" with the data that are available; extrapolating data to relationships or conclusions that are not warranted. However, an ethical researcher will make certain that the data used in a research analysis are appropriate to the study whether performing a primary or secondary analysis.

Another type of analysis that uses secondary data is meta analysis. This is an analysis technique used to synthesize the results of multiple existing quantitative research studies of a single phenomenon into a single result. Statistically, meta analysis combines the effect size estimates of the individual studies into a single estimated effect size or a distribution of effect sizes. Due to the probabilistic nature of inferential statistics, the statistical significance of research results is only an estimate as to whether or not the hypothesis being tested is true. Therefore, even when the results of a study show statistical significance, the hypothesis still may not be true. This is one of the reasons that the results of research studies cannot always be replicated by other researchers. Through meta analysis, a body of research results performed by different researchers can be examined to help the researcher get a better picture of the overall pattern of the results of numerous studies conducted on the same phenomenon.

Applications

Use of Secondary Data in Social Science

Depression in Mothers

Secondary data analysis is frequently used in the social sciences. For example, Horwitz et al (2007) performed secondary analysis on data to understand the prevalence, correlates, and persistence of depression in the mothers of young children. The authors' review of the literature on depression found that only bad depression tends to occur more frequently in women, but also adds that “depression is expected to replace cancer as the second leading cause of morbidity within the next decade” (Horwitz, 2007). This study is an eight-stratified in gender-stratified random sample of children born in the all-New Haven Hospital between July 1995 and September 1997 and who lived in the New Haven Meriden Standard Metropolitan Statistical Area. Individuals in this group were excluded from the sample if they were born prematurely, had low birth weight, were likely to have developmental delays due to birth complications, or head chromosomal anomalies. Of an original pool of 7433 subjects, 1605 are found to be eligible under these criteria, of which 1278 participated in the initial data collection. Of these, 1095 participated in the one-year follow-up.

Several measures were collected for these individuals. Birth record information was obtained from birth records provided by the State of Connecticut Department of Public Health. These data included birth weight, gestational age, 1- and 5-minute Apgar scores, parental age, maternal education, and similar measures. Information about sociodemographic variables was collected using a short written survey instrument that was answered by the mothers. Mothers were also asked to report the difficulty in financial strain on a five-point scale ranging from "easy" to "difficult." In addition, the mothers were asked to rate their child's current physical health on a five-point scale. Mothers were also asked to respond to several standard questionnaires: The Beck Anxiety Inventory, the expressiveness and conflict scales from the Family Environment Scale, an adaptation of the of the Life Events Inventory, the social support items of the Medical Outcomes Study parent questionnaire, the short form Parenting Stress Index, and the Quality of Marriage Inventory. Maternal depression was measured using the Center for Epidemiologic Studies Depression Scale.

The variables were placed into groups as follows: Maternal social demographic characteristics, maternal mental and physical health characteristics, maternal support and stretched measures, child characteristics, and spouse/partner characteristics. Variables in each domain were statistically evaluated. Variables that did not have an effect on the outcome were investigated. Data analysis indicated that elevated self-reported symptoms of depression were related to various factors. These included “younger maternal age, lower maternal education, unemployment, minority race, maternal physical health status, single parenting, poverty, difficulty paying bills, high anxiety, high family conflict, family expressiveness, high parenting stress, social support, and high parental life events” (Horwitz, 2007). However, the analysis showed no significant relationship between elevated levels of depressive symptoms and birth of a child within the previous year.

The results of the study suggested that elevated depressive symptoms in mothers of 11- to 42-month-old children are prevalent. The analysis also a relationship between elevated symptoms of depression and associated characteristics for the other sample points (i.e., anxiety, high parent distress, poor physical health, financial strain, high life events, low social support, having younger children). The findings also suggested that women with co-occurring anxiety to live and conflict laden environments ever greater tendency to continue to report elevated symptoms of depression. This result supports the previously identified importance of anxiety as a predictor of non-repression of depression symptoms reported in the literature.

Self-Attribution in Victims

In another example of the use of secondary data in the social sciences, Littleton, Magee, and Axsom (2007) performed a meta analysis of self-attribution following three types of trauma: sexual victimization, illness, and injury. Self-attributions can be defined as the victim accepting responsibility or blame for the event of which s/he was a victim. Although the literature discusses theoretical models concerning the causes of self-attribution, the authors found little empirical research that investigated why self-attributions occur. Their investigation had four goals. First, they desired to determine the prevalence of self-attribution following a traumatic event. Second, they examined the effect of variable research methodology on reports of self-attribution. Third, they attempted to identify predictors (including individual differences and trauma variables) of self-attributions. Fourth, they desired to determine whether behavior and character are distinct.

To answer these research questions, the authors conducted a meta analysis of all existing studies of self-attributions which followed three types of trauma (i.e., sexual, illness, injury). The reason these factors were chosen for this study was because self-attributions are most frequently studied following these types of traumas. Therefore, self-attributions following these three types of trauma were the best candidates for meta-analysis. Further, studies concerning these types of traumas typically were found to include clear operational definitions of predictor variables. In addition, the researchers found few studies of self-attribution studies for other kinds of trauma, thereby making these three types of trauma the best candidates for meta-analysis.

Studies for inclusion in the meta-analysis were identified in several ways. The authors conducted literature searches in several professional databases using a number of keywords and phrases. In addition, researchers in the field were contacted directly and asked for data from any unpublished studies. Studies compiled in these two ways were discarded if they did not involve individuals reporting self-attribution following a trauma s/he actually experienced or if the data on effect size that is necessary to conduct of the meta-analysis could not be obtained from the study or its authors. This resulted in a total of 69 studies of self-attributions that were used in the meta-analysis. Thirty-four of the studies analyzed self-attributions following sexual victimization (i.e., rape, sexual abuse, incest, sexual harassment, attempted rape). Twenty-two studies analyze self-attributions following illness. The final 13 studies examine self-attribution following severe injury (i.e., spinal cord injury, severe burns, head injury). Effect sizes or self-attributions were calculated and reported level of self-attribution in each study was compared to the level expected due to chance. In addition, several potential predictors of self-attributions as well as degree of life thread (i.e. high or low) were coded. For each categorical variable examined, an analysis of variance was conducted.

The results of the meta analysis showed that self-attribution occurred at a lower than chance level for most of these traumas. The analysis of methodological differences showed that open-ended queries regarding attribution resulted in significantly lower levels of self attribution than did closed-end queries, although there was considerable heterogeneity in effect sizes among the studies. The results also showed that reporting self-attribution did not vary significantly among studies based on the measure used for this variable. The analysis also found that reported the level of self-attribution decreased as the lengths of the trauma in the studies increased. Similarly, there is a tendency for the level of self-attribution reported to decrease as the mean age of the subjects increased. Other potential predictor variables do not appear to predict self-attribution.

Conclusion

Obtaining primary data for social science research can be difficult for a number of ethical and logistical reasons. However, there are a number of public and private resources that offer the researcher secondary data that can be reanalyzed for further investigations. Although this makes data collection for the most part both easier and less expensive, the analysis of secondary data is not without its disadvantages as well. However, a well-designed study using secondary analysis or meta analysis can be of great advantage to the researcher who cannot obtain primary data from another source and can significantly aid in the advancement of the science.

Terms & Concepts

Experiment: A situation under the control of a researcher in which an experimental condition (independent variable) is manipulated and the effect on the experimental subject (dependent variable) is measured. Most experiments are designed using the principles of the scientific method and are statistically analyzed to determined whether or not the results are statistically significant.

Inferential Statistics: A subset of mathematical statistics used in the analysis and interpretation of data. Inferential statistics are used to make inferences such as drawing conclusions about a population from a sample and in decision making.

Inter-Interviewer Reliability: The consistency with which different interviewers obtain similar responses from subjects using the same interview instrument. Interviewer bias and interviewer effects can lead to low inter-interviewer reliability.

Interviewer Bias: The expectations, beliefs, prejudices, or other attitudes that may affect the interview process and the subsequent interpretation of data collected through the interview process.

Interviewer Effects: The influence of the interviewer's behaviors and attributes on the subject's response in an interview situation. For example, the appearance, demeanor, training, age, gender, and ethnicity of an interviewer may all affect the way that a subject perceives the interview and/or responds to the questions on an interview. In some cases, the subject may try to please the interviewer by giving responses that s/he thinks the subject may want to hear or in other cases may give non-responsive answers in order to negatively impact the value of the data collected by an interviewer that s/he does not like.

Meta Analysis: A secondary analysis technique used to synthesize the results of multiple existing quantitative research studies of a single phenomenon into a single result. Statistically, meta analysis combines the effect size estimates of the individual studies into a single estimated effect size or a distribution of effect sizes.

Qualitative Research: Scientific research in which observations cannot be or are not quantified (i.e., expressed in numerical form).

Quantitative Research: Scientific research in which observations are measured and expressed in numerical form (e.g., physical dimensions, rating scales).

Secondary Analysis: A further analysis of existing data typically collected by a different researcher. The intent of secondary analysis is to use existing data in order to develop conclusions or knowledge in addition to or different from those resulting from the original analysis of the data. Secondary analysis may be qualitative or quantitative in nature and may be used by itself or combined with other research data to reach conclusions.

Subject: A participant in a research study or experiment whose responses are observed, recorded, and analyzed.

Survey: (a) A data collection instrument used to acquire information on the opinions, attitudes, or reactions of people; (b) a research study in which members of a selected sample are asked questions concerning their opinions, attitudes, or reactions are gathered using a survey instrument or questionnaire for purposes of scientific analysis; typically the results of this analysis are used to extrapolate the findings from the sample to the underlying population; (c) to conduct a survey on a sample.

Variable: An object in a research study that can have more than one value. Independent variables are stimuli that are manipulated in order to determine their effect on the dependent variables (response). Extraneous variables are variables that affect the response but that are not related to the question under investigation in the study.

Bibliography

Campbell, D. A. (2007). Secondary analysis. Orthopaedic Nursing, 26 , 241-242. Retrieved April 23, 2008, from EBSCO Online Database Academic Search Premier. http://web.ebscohost.com/ehost/pdf?vid=13&hid=21&sid=62757d2f-5bcb-4125-bd0a-894bb563f1a8%40sessionmgr2

Horwitz, S. M., Briggs-Gowan, M. J., Storfer-Isser, A., & Carter, A. S. (2007). Prevalence, correlates, and persistence of maternal depression. Journal of Women's Health, 16 , 678-691. Retrieved April 24, 2008, from EBSCO Online Database SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=25705603&site=ehost-live

Littleton, H. L., Magee, K. T., & Axsom, D. (2007). A meta-analysis of self-attributions following three types of trauma: Sexual victimization, illness, and injury. Journal of Applied Social Psychology, 37 , 515-538. Retrieved April 24, 2008, from EBSCO Online Database SocINDEX with Full Text. http://web.ebscohost.com/ehost/pdf?vid=12&hid=13&sid=044e731e-f467-4ff0-b74c-178c235c732d%40sessionmgr9

Schaefer, R. T. (2002). Sociology: A brief introduction (4th ed.). Boston: McGraw-Hill.

Stockard, J. (2000). Sociology: Discovering society (2nd ed.). Belmont, CA: Wadsworth/Thomson Learning.

Vartanian, T.P. (2011). Secondary data analysis. New York: Oxford University Press.

Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures: Nonreactive research in the social sciences. Chicago: Rand McNally College Publishing Company.

Young, R., & Johnson, D. (2013). Methods for handling missing secondary respondent data. Journal of Marriage & Family, 75, 221-234. Retrieved October 23, 2013 from EBSCO online database, SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=84935698&site=ehost-live

Suggested Reading

Andrews, L., Higgins, A., Andrews, M., & Lalor, J. G. (2012). Classic grounded theory to analyse secondary data: Reality and reflections. Grounded Theory Review, 11, 12–26. Retrieved October 23, 2013 from EBSCO online database, SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=77668319&site=ehost-live

Bennett, T., Holloway, K., & Farrington, D. (2006). Does neighborhood watch reduce crime? A systematic review and meta-analysis. Journal of Experimental Criminology, 2 , 437-458. Retrieved April 24, 2008, from EBSCO Online Database SocINDEX with Full Text. http://web.ebscohost.com/ehost/pdf?vid=13&hid=13&sid=044e731e-f467-4ff0-b74c-178c235c732d%40sessionmgr9

Dixon, A., Khachatryan, A., & Yang, T. (2012). Socioeconomic differences in case finding among general practices in England: Analysis of secondary data. Journal of Health Services Research & Policy, 17(s2), 18-22. Retrieved October 23, 2013 from EBSCO online database, SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=78123891&site=ehost-live

Gibbs, D., Berkman, N., Weitzenkamp, D., & Dalberth, B. (2007). Federal support for adoption subsidies: State-level variations and the impact for adoptive families. Journal of Public Child Welfare, 1 , 71-90. Retrieved April 24, 2008, from EBSCO Online Database SocINDEX with Full Text. http://web.ebscohost.com/ehost/pdf?vid=9&hid=13&sid=044e731e-f467-4ff0-b74c-178c235c732d%40sessionmgr9

Hall, M., & Farkas, G. (2011). Adolescent cognitive skills, attitudinal/behavioral traits and career wages. Social Forces, 89, 1261–1285. Retrieved October 23, 2013 from EBSCO online database, SocINDEX with Full Text. http://search.ebscohost.com/login.aspx?direct=true&db=sih&AN=60914643&site=ehost-live

Petrosino, A. & Lavenberg, J. (2007). Systematic reviews and meta-analyses: Best evidence on "what works" for criminal justice decision makers. Western Criminology Review, 8 , 1-15. Retrieved April 24, 2008, from EBSCO Online Database SocINDEX with Full Text. http://web.ebscohost.com/ehost/pdf?vid=11&hid=13&sid=044e731e-f467-4ff0-b74c-178c235c732d%40sessionmgr9

Trzesniewski, K.H., Donnellan, M.B., Lucas, R.E. (2011). Secondary data analysis: An introduction for psychologists. Washington, DC: American Psychological Association.

Essay by Ruth A. Wienclaw, Ph.D.

Ruth A. Wienclaw holds a doctorate in industrial/organizational psychology with a specialization in organization development from the University of Memphis. She is the owner of a small business that works with organizations in both the public and private sectors, consulting on matters of strategic planning, training, and human/systems integration.