Data collection

Data collection is the process by which scientists, scholars, and other researchers gather information to test their hypotheses and arguments. There are many different ways to gather data, including visual observation, textual interpretation, interviews, surveys, and experiments. Because data is gathered and then analyzed by researchers to prove a point or advance an argument, it is important that researchers follow guidelines on accuracy and ethics. At US universities, these guidelines are overseen by the university’s institutional review board. Many other nations have similar government or university offices that approve and oversee research.

113931137-115330.jpg

Brief History

Scientists, adventurers, and scholars have been collecting data for centuries. Some of the earliest written records are from merchants who were keeping track of sales, which is a type of data collection that can be used to determine which goods to buy and at what price they should be sold. Sailors and explorers also collected data on weather and star patterns so that they could have the safest journey possible and avoid becoming lost. Scientists conducted experiments to better understand the natural world, and the results were written down and analyzed as data, providing both a historical record and a way to understand historical documents. The critical element of each of these is that by producing a written record of data, the data collector is not only proving a hypothesis, but also recording data for future use by others. For scientists, this is particularly necessary, as a study’s results must be able to be replicated for verification.

While many processes of data collection are harmless to their subjects, some modes of data collection, especially medical or psychological data collection, can cause harm to the test subject. For example, during World War II, Nazi scientists conducted many experiments on test subjects without the subject’s permission or, at times, knowledge of the research. Detailed records from these experiments were kept on topics such as hypothermia, and later researchers have been faced with the ethical dilemma of whether they should make use of data that was unethically collected. In an attempt to prevent future unethical forms of data collection, the Nuremberg Code was established in 1947, as a result of the Nuremberg Trials, to regulate international medical experimentation.

In the United States, between 1932 and 1972, the US Public Health Service conducted a project known as the Tuskegee Syphilis Experiment, which examined the effects of untreated syphilis on Black men. These men were told that they would be given free medical care, food, and burials in exchange for participating in the study. While the men freely signed up for the study, the Tuskegee Syphilis Experiment is considered unethical because the men were not told that they had syphilis, and they were told that their condition would be treated when, in fact, the researchers never had any intention of providing treatment. Additionally, because they did not know that they had syphilis, many of these men passed the disease on to their wives and children, who had not been informed or consented to the study.

This study played a role in the establishment of institutional review boards (IRB) to oversee all medical and behavioral research that involves human subjects in the United States. When an American researcher conducts research internationally, the researcher must have IRB approval from an American university as well as approval from the ethics review board of the country in which the research is occurring. While the IRB covers many types of research, it only focuses on data collection that seeks to produce generalizable knowledge. This means that if a study collects data from a small group of people and then uses that data to understand a larger group, the study must be approved by the IRB. However, if a study collects data from an individual and then uses that data to only understand the individual, then the study does not have to be approved by the IRB. Data collected from publicly available databases, publications, or the public pages of social media platforms such as Twitter and Facebook are also commonly exempt from IRB oversight.

Data Collection Today

Contemporary researchers use a plethora of data collection techniques. Textual data is collected from archival documents such as books, censuses, and recordings. Additionally, visual textual data can be collected from paintings, photographs, and advertisements.

Visual observation is a method of data collection in which the researcher visits a location of interest, observes activities in that area, and keeps detailed notes commonly known as field notes. If the researcher participates in community activities, then the research is known as participant observation. For example, studying the ways that birds build nests would be visual observation, while studying the ways that humans build houses by assisting in a barn raising would be participant observation.

Interviews and surveys are structured forms of research in which the researcher predetermines a set of questions and then asks those questions to individuals or groups of participants. A census is a set of questions that is asked of every member of a group, while a survey is a set of questions that is asked of a part of a group and then used to better understand the entire group. Surveys can include closed-ended questions, which ask the participant to choose from a set of answers, and open-ended questions, which ask the participant to respond to a question using their own words. Interviews can be thought of as a set of open-ended questions.

Experiments gather data by using an intervention on a set of subjects. This intervention might be changing the amount of light that a plant receives, giving a trial medication to patients, or giving an extra hour of tutoring to students. For researchers concerned with data collection, it is important that data is gathered about the subject before, during, and after the intervention. Additionally, data should be kept on a control group, which is a test group that does not receive the intervention, as a baseline for comparison for the data from the experimental group.

Regardless of the method used, collected data is analyzed and might contribute to publications or future studies. When a study is finished, the collected data is commonly deposited at a library or archive for use by future researchers.

Bibliography

Aanas, George J., and Michael A. Grodin, editors. The Nazi Doctors and the Nuremberg Code of Rights in Human Experimentation. Oxford UP, 1995.

Charmaz, Kathy. Constructing Grounded Theory. Sage, 2014.

Driscoll, Kevin, and Shawn Walker. "Big Data, Big Questions: Working within a Black Box: Transparency in the Collection and Production of Big Twitter Data." International Journal of Communication, vol. 8, 2014, p. 20.

Emerson, Robert M., Rachel I. Fretz, and Linda L. Shaw. Writing Ethnographic Fieldnotes. U of Chicago P, 2011.

Gregory, Ian, et al. "Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research." International Journal of Humanities and Arts Computing, vol. 9, no. 1, 2015, pp. 1–14.

Phillips, Patricia Pulliam, and Cathy Stawarski. Planning for and Collecting All Types of Data. Pfeiffer, 2008.

Reverby, Susan M. Examining Tuskegee: The Infamous Syphilis Study and its Legacy. U of North Carolina P, 2013.

Robinson, Scott, et al. "Data Collection." TechTarget, June 2024, www.techtarget.com/searchcio/definition/data-collection. Accessed 14 Oct. 2024.