Archival data in psychological research

  • TYPE OF PSYCHOLOGY: Psychological methodologies
  • SUMMARY: Archival data, or information already on record, offer several advantages to resourceful researchers: saving of time, access to large quantities of information, and avoidance of some ethical issues, to list a few. At the same time, use of such data carries with it several risks, the worst of which is potential inaccuracy.

Introduction

A major part of any research enterprise is the gathering of —the information from which conclusions will be drawn and judgments made. This gathering can be accomplished in many ways, each with its advantages and drawbacks. Often, by using a combination of methods, a skilled researcher can let the strengths of one method compensate for the weaknesses of another. Which method, or combination of methods, is most appropriate depends on several factors. If research is intended to be only descriptive, the scientist may find observation adequate; if the research is intended to establish cause-and-effect relationships clearly, experimentation is all but essential.

Observation and experimentation methods actively involve the scientist gathering the data to be used. This involvement allows considerable control over possible sources of error, but it also limits what can be accomplished. For example, a scientist cannot step back into the past, cannot observe (or experiment with) more than a fairly small number of subjects during most research, and cannot avoid the possibility that the subjects' knowledge involved in the research will distort the answers most people give, or the behaviors they display. When they can be located and used, eliminate many of these problems for descriptive research and may, because they extend across time, give hints of cause-and-effect relationships typically revealed only by experimentation.

The term “archival data” may first suggest only information shelved in public archives, such as courthouse records. Indeed, such a location may hold much useful information, but it is only one of dozens of possibilities. Similarly, “data” may first suggest only collections of numbers; again, however, many other possibilities exist. For example, Aurelius Augustinus, better known as Saint Augustine, wrote his autobiography, Confessiones (397–400; Confessions, 1620), well aware that its contents would fascinate his own and later generations. It seems likely that he also realized that he was presenting more than just information about himself to his readers. Personal documents may also be used by contemporary researchers in ways unlikely to have been anticipated by their source. Comparing many autobiographies written over the centuries, a modern day developmental psychologist might examine how earlier generations behaved during the period now known as . A career counselor might examine how people who changed their original occupations in midlife managed to do so.

With the worldwide distribution of printed material, films, and electronic media, mass communications served well as archival data for hundreds of topics. One problem, however, may be the presence of so much information that a sampling procedure must be devised to decide what to use. One caution regarding the use of mass media is sources can also apply to personal documents and statistical data. Researchers who want to extract data from, for example, United States newspaper reports from 1945 must consider the reliability of what they may find. If they plan to use the reports as indicators of national public opinion, they must select several different newspapers published nationwide. A single newspaper might serve to suggest what its own editor and readers believed in a specified area, but dozens might be required to suggest national beliefs, and even dozens might not provide the information researchers originally sought out. If various papers provided very different accounts of an event, or divergent editorials regarding it, researchers might have to focus on differences rather than unanimity of belief.

In Confessions, Saint Augustine discussed the possibilities that writers might not know something about themselves or might state something they know to be untrue. This validity issue also applies to the electronic media and, like differences of opinion across several newspapers, must be dealt with by consulting several independent sources, if they can be located.

Statistical data—measurements or observations converted to numerical form—can be the most immediately useful, yet possibly the most dangerous, archival data for a researcher to use. Most typically in psychological research, information is converted to numbers, the numbers are processed in some manner, and conclusions are drawn. When researchers gather data themselves, they know where the numbers came from, whether they should be considered approximations or precise indicators, and a host of other facts essential to their interpretation. When researchers process archival data—information gathered by others for their own purposes—such information essential to understanding their interpretation is often unknown and must be sought as part of the research.

For example, a psychologist seeking information about the education levels of employees in a company might find it directly available on application blanks on file. If those blanks recorded the applicants’ stated education levels, however, and there was no evidence that those statements had been verified as a condition of hiring, it would be risky to consider them information highly accurate data. Most archival data need to be verified in some manner; how fully this is done should depend on the degree of certainty needed in the research.

Uses and Drawbacks

The use of archival data in the classic work Le Suicide (1897; Suicide, 1951) by sociologist Émile Durkheim illustrates how much a master researcher can learn from already available material. Hypothesizing that social factors are key bases for suicide, he first gathered years of suicide records from European countries where they were available and then examined these statistics in light of additional archival data to evaluate several alternative hypotheses.

Noting that suicide rates increased from January to June, then fell off through the rest of the year, he considered the possibility that suicide is influenced by temperature. Finding, again from records, that suicides did not vary directly with temperature increases and decreases, he was drawn back to his favored hypothesis that social factors were of key importance. To elaborate on such factors, he used archival data again to consider religion, family, and political atmosphere.

Durkheim gained significant advantages using archival data compared to limiting himself to data personally gathered. For example, had he personally interviewed families and friends of people who died by suicide, far fewer cases would have been available to him, probably ones restricted to a fairly limited geographic area. He also would have been limited by time factors: It seems unlikely that interviewing years after the event would have been possible for most cases. Unavoidably, he ran risks in accepting available records as accurate, but he judiciously chose records likely to have been carefully assembled and unlikely to have contained willful distortions. As the world has changed since Durkheim’s day, so have the opportunities to apply archival data to research questions. Part of the change resulted from the emergence of numerous and more varied archives and new methods of searching them.

For Durkheim, information that existed in print or still photographs, or could be told to him from someone’s memory, was all that was available. The beginning of the twentieth century has dramatically changed both the form and the amount of archival data in existence. For modern researchers, the addition of new, mainly electronic, media supplemented existing traditional media types. Silent motion pictures, phonograph records, radio (with transcription discs), sound motion pictures, audio recording wire and then tape, television (with video recording tape), and computer storage increased available information almost immeasurably. They have also created the possibility of finding obscure, previously unavailable data fragments.

Researchers studying leading to war, for example, have for centuries been able to work with written sources—documents, books, and newspapers. From the early 1900s on, social psychologists could add to those archival sources newsreel footage of political leaders’ participation in war-related events as well as a few phonograph records of their speeches. From the late 1920s on, they could add transcriptions (disc recordings) of radio broadcasts and sound motion-picture coverage. From the late 1940s on, they could add films, then television broadcasts and videotapes of them.

Beginning in the mid-1980s, a new type of archive emerged, allowing enormous amounts of information to be saved, distributed worldwide, and searched electronically for desired information. Computer data storage changed the handling of information comparable to the change sparked by the invention of the printing press. Pulling information from storage media ranging from magnetic tape to CD-ROM (compact disc read-only memory) storage to Internet databases, researchers gained access to libraries of information—from indexes to research literature to archival data—and could sort through it with revolutionary speed and accuracy.

For example, a researcher with a computer and an Internet connection could search the entire works of William Shakespeare, encyclopedias, atlases, Bartlett’s Familiar Quotations, world almanacs, and more, in a fashion that may be considered a modern version of looking for a needle in a haystack. The old phrase suggests looking for something that exists but is so hidden that chances of finding it are nil. Computer searching is the equivalent of searching the haystack with a powerful metal detector and electromagnet to pull the needle from the depths of the stack.

Studying attitudes toward older adults, for example, a scientist could direct searches for many keywords and phrases (old age, elderly, retiree, respected, and so on), some only very remotely related to the topic. The speed and accuracy of digital technology like cloud storage and file sharing make feasible needle-in-haystack searches that were impractical to consider by earlier methods. A world atlas might contain very few age-related references, but with a search at lightning speed possible, the one or two references to retirement might be worth seeking.

Assessing Resources

Although scientific psychology has always taught its students how to generate data through their own research, in no way has it denied them the right to use data already available if the data meet their needs. Like other data, archival data must meet reasonable standards of reliability and validity, standards not always easy to assess when several sources, perhaps over an extended period, have generated the data.

Researchers who use other researchers’ data probably have the fewest worries. Since the 1920s, published research standards have been uniform enough that modern readers can clearly understand what was done to produce data and, from that understanding, can judge their quality. Researchers who work from personal documents have a more difficult task in determining data quality. What the writers stated might be distorted for various reasons, ranging from intentional deception to an incomplete comprehension of the subject matter. If the new researcher is working to assess the personality of an author, for example, checking the internal consistency of the document may be a useful, if not definitive, way of evaluating data quality. If the new researcher is studying some historical event, comparing the diary of one observer with those of others could help validate data obtained.

Researchers who work from mass media, which may carry carelessly assembled or intentionally slanted information, or those who work from public records that might, a century or more ago, have ignored minority populations—or even those who work from an online database that contains only works written in the English language—have special problems of data accuracy, and each must devise ways of discovering and working around them.

As compensation for the special problems that archival data present, they possess an advantage that all but eliminates worry about invasion of privacy, often a major issue in research ethics. By their very definition, archival data are already public, and rarely does new researcher analysis produce sensitive conclusions. In the rare case where it does, the researcher can simply decide not to report a particular conclusion, and the information does not become public knowledge. By contrast, in certain experimental research, when subjects reveal something they prefer had remained unknown (perhaps that they would cheat to succeed at some task), the ethical harm is already done if the subjects realize that the experimenter knows of their failing. In this case, not publishing the results cannot remove their discomfort.

Bibliography

Comer, Jonathan S., and Philip C. Kendall, editors. The Oxford Handbook of Research Strategies for Clinical Psychology. Oxford UP, 2013.

Elder, Glen H., et al. Working with Archival Data: Studying Lives. Sage, 2005.

Freud, Sigmund, and William C. Bullitt. Thomas Woodrow Wilson, Twenty-Eighth President of the United States: A Psychological Study. Transaction, 1999.

Iversen, Gudmund R. Contextual Analysis. Sage, 2005.

Langer, Walter Charles. The Mind of Adolf Hitler. Basic, 1972.

McBride, Dawn M. The Process of Research in Psychology. 5th ed., Sage, 2023.

Schweigert, Wendy A. Research Methods in Psychology: A Handbook. 4th ed., Waveland, 2021.

"Section 7. Collecting and Using Archival Data." Community Tool Box Curriculum, University of Kansas, ctb.ku.edu/en/table-of-contents/evaluate/evaluate-community-interventions/archival-data/main. Accessed 20 Jan. 2025.

Selltiz, Claire, et al. Research Methods in Social Relations. Holt, 1976.