Digital Archiving

Overview

Digital archiving is the collecting and storing of documents, images, sounds, videos, and other digital files in a way that enables future users and researchers to recall information. Some digital archives are personal and others contain information pertaining to an entire nation. Additionally, some digital archives are made up of contemporary digital materials while others contain the scans or copies of ancient texts that have been stored in an archive. For scholars and researchers, digital archives are exciting because they allow access to a wide array of materials, often without having to make a trip to a physical library. However, as digital archives have been developed, scholars and researchers have had to address questions and concerns regarding privacy and piracy—that is, under what conditions is the privacy of individuals or groups violated when materials are archived, and who should have access to archives.

One of the first digital archives was the Online Public Access Catalogue, known as OPAC, which allowed librarians and library patrons to search for books across a variety of libraries. The OPAC system was modeled on traditional, paper-based card catalogs, and includes information regarding a book's title, author, publisher, year of publication, and a few search terms to help librarians to find similar books. OPAC was based on university online catalogs of large libraries, such as that at Ohio State University, developed in 1975. These early systems were slow and required that the library have specially dedicated computers to perform searches. As such, it was primarily librarians that had access to online card catalogs and library patrons had to work with a librarian to use the system. This method of searching ensured that the patron could successfully find a book or other resource, but it also decreased the anonymity of the patron's search because he or she had to explain to the librarian what the search terms were. This question of being able to search anonymously would later become important in the study of digital archiving.

Libraries were not the only ones filing information in online formats. Doctors offices and hospitals also began storing patient information in computer-based systems, which made it easier for a doctor to quickly access information but gave rise to a number of privacy concerns about who had access to those electronic files. Historically, medical files were kept in file folders that could only be accessed with specific permission. It would be obvious if someone without access began taking files off the shelf or out of a filing cabinet. However, with the movement toward digital archiving, it became easier for office staff to access patient files, possibly without having permission to do so. In the United States, a number of regulations have been put into place to regulate this access to information. For example, the Health Insurance Portability and Accountability Act (HIPPA) of 1996 regulates who can and cannot access medical information.

While there is still a risk of unauthorized access, HIPPA requires that medical offices go to great extents to prevent such unauthorized access, and puts harsh punishments on offices that do not correctly protect patient data. The HIPPA law has become increasingly important as data breaches have occurred. According to statistics from the HIPAA Journal, 4,419 data breaches affecting health-care records of five hundred or more files have occurred from 2009 to 2021. The journal also reported that in 2022, the Office for Civil Rights (OCR) reported 720 data breaches of 500 or more records, and in 2023, the OCR reported 725 data breaches, and across these breaches, more than 133 million records were exposed or impermissibly disclosed. In addition to providing patient privacy, HIPPA is designed to save money for both doctors and insurance companies by encouraging and standardizing the use of digital files and digital file sharing between offices.

The development of more online tools, and the use of tools such as smartphones by doctors, has required many offices to determine and reexamine their procedures for using a digital archive of patient data. It is important both that healthcare professionals have a clear set of guidelines for using these archives and that they explain in a clear way to their patient how that information is sorted and used. Sabin and Harland (2017) examine the ways that these changes are occurring in psychiatry, where digital tools and archives have the potential to greatly improve patient care but have many risks of exposing patients and their data to unauthorized users. Communication scholars are particularly interested in the ways that these risks and potential data breaches are explained to patients, the ways in which patients interpret that information, and how decisions based on that information are made. They are also interested in the communicative potential and success of massive digital archiving projects. For example Quintana et al. (2008) examine the use of Cure4Kids, a massive online digital archive of childhood disease and cures that enables doctors to quickly research and treat patients with rare diseases. This digital archive provides information to doctors around the world and has been attributed with saving the lives of many children who would otherwise have been classified as medical mysteries or would not have been diagnosed in time to save their lives.

Online digital archives continued to expand alongside the expansion and improvement of computer systems and speeding up of Internet access. Soon it was not only the reference information that was listed, but also summaries of the information, and later full texts of the information that were cataloged. As file size became less of a concern, some archives began to include complex design files, such as clothing patterns, architectural models, and machine diagrams. Communication scholars are studying the way that these archives have become distributed across a wide array of computer hard drives, cloud storage, and search engines. Scholars are interested in the ways that information is archived, and the labor of producing those archives. They are also interested in the ways that expanding the idea of what can be archived, and what history is worth saving is changing. These changes have the potential to open new perspectives to historians, and legitimate the work of people commonly overlooked or silenced by history (De Kosnik, 2016). As archives have become more digital than physical, the question has become less a matter of what will be archived; it is now possible to archive everything. Instead, it has become necessary to debate what should be archived first, who should have access to those archives, and how fast and how many searchers should be allowed to access information (Rotermund & Herzog, 2017).

Digital archives are used in every academic and many professional fields. They have been designed to share and store documents pertaining to subjects such as medicine, architecture, engineering, art, and agriculture. Digital archives are also commonly used for popular culture and mass media, providing access to music, books, and films. Other digital archives are produced by social media corporations, which store, organize, and sometimes sell information about their users. While some digital archives are free, others are pay-per-use, and some are blocked so that they can only be accessed by workers in a specific field or with a high level of security clearance. Communication researchers are interested in these differing uses and levels of access. As digital archives continue to evolve, so too do the ways in which scholars use and think about these tools.

rsspencyclopedia-20180417-81-179398.jpgrsspencyclopedia-20180417-81-179532.jpg

Further Insights

Because many of these historic files are on paper and take time to process into a digital file, many professors and researchers have designed classes that teach students to produce digital files while working through a museum or library archive to create a useful digital archive. For example, Baldwin (2018) worked with students and a museum to digitize clothing patterns from 1951 to 1967. While students learned a good deal about pattern making in this project, they also were able to examine many different types of communication, from translating paper patterns to digital files, understanding the ways in which files are coded and shared, and having direct contact with museum archivists.

Other activists and scholars are studying the ways that digital archives can be sued to collect, store, and distribute community stories and history. Allowing local communities to determine how and in which ways their history is told is an empowering act. Through local recording and storage of stories, researchers are able to investigate "microhistories" or historic events which did not effect a large group of people, or did not make it to the mass media when they occurred. While these events may be small, they provide critical insights into human behavior, and at times might provide insight into how a conflict emerged by looking at its origins (Caswell & Mallick, 2014).

Digital technologies make this process of storytelling possible, preserving a wide variety of voices and perspectives, not just those that have good connections with academia or the mass media. Much larger projects, such as the American StoryCorps, have set up recording booths throughout the country to encourage individuals to tell their stories. These stories are often about historic moments, personal events, and messages that individuals want to leave for their families. The stories are then saved and made available to the world through a digital archive that is searchable by topic, location, and year. These collected stories are useful for communication scholars because they provide insight into a wide diversity of different perspectives. They also save researchers considerable time because for some projects they do not have to conduct interviews themselves and can instead spend their time on analysis. For example, Herman (2016) has examined the ways that the StoryCorps digital archive can be used to understand the experiences of Vietnamese refugees. The narratives recorded with StoryCorps were later linked together with interviews from soldiers that served in Vietnam and documentary films and images about the war in Vietnam. These materials were used to produce a complex digital archive that can be used by researchers, students, and others interested in this time in history.

Issues

Communication scholars and researchers are interested in the problems and challenges presented by digital archives. For example, many different groups of people are able to add information into a digital archive, such as StoryCorps. Yet, it is only experts who can control how the archive is organized, accessed, and supported. This means that all groups of people can add information to the archive, but only experts can control how that data is organized, used and what, if any information is kept private or protected. These "end user challenges" were studied by Markkula and Sormunen (2000), regarding the ways that digital newspaper photos could be searched. They found that direct searches were successful, but simply browsing through images could be problematic. This access to information and the requirement that a searcher already have a good idea of what they are searching for is problematic for communications researchers.

Some researchers, such as Frick (2014), are interested in the ways that digital archives can help in document repatriation projects. Digital repatriation is a necessary and important phenomenon that is occurring throughout the world as independent countries, which were once colonies, request documents from their former colonizers. For example, in Frick's research Australia and New Zealand have requested that films about their countries, produced during the period of British colonization, be returned to them. Historically, this process of document repatriation has occurred through physical film transfer, a process which takes a good deal of time to produce a copy, mail the copy, and then ensure that the copy is correctly filed in the new library. Using digital archives, this process can be done almost instantly, ensuring better access to archival materials and saving a lot of money in the process. Yet, some colonial archives are resistant to this process because it opens up historical documents to increased scrutiny. This is particularly troublesome for the countries that did the colonizing and often committed human rights abuses in the process of colonization. For these countries, it is more comfortable to continue to suppress information. Yet, as more scholars use and build digital archives, it may become increasingly difficult for these countries to refuse to share their archival materials.

Other smaller, but still significant problems have occurred in the gaps of digital archives, or in their information which is missing or not listed. Hansen and Paul's (2015) research and interviews with newspapers indicates that while much of the print reporting of newspapers has been archived, many newspapers did not keep copies of or archive their online-only content. This produces a spotty archive, which does not accurately represent the news that was reported on a particular topic or at a specific time.

Bibliography

Baldwin, S. (2018). An object-based research study of archive pieces incorporating digital technology. Art, Design & Communication in Higher Education, 17(1), 25–32.

Blanke, Tobias. (2 Feb. 2024). Reassembling digital archives--strategies for counter-archiving. Humanities and Social Science Communications, 11(201), doi.org/10.1057/s41599-024-02668-4

Caswell, M., & Mallick, S. (2014). Collecting the easily missed stories: Digital participatory microhistory and the South Asian American digital archive. Archives and Manuscripts, 42(1), 73–86.

De Kosnik, A. (2016). Rogue archives: Digital cultural memory and media fandom. MIT Press.

Frick, C. (2014). Repatriating American film heritage or heritage hoarding? Digital opportunities for traditional film archive policy. Convergence, 21(1), 116–131.

Hansen, K. A., & Paul, N. (2015). Newspaper archives reveal major gaps in digital age. Newspaper Research Journal, 36(3), 290–298.

Healthcare Data Breach Statistics. (2024). HIPAA Journal, www.hipaajournal.com/healthcare-data-breach-statistics/.

Herman, T. S. (2016). First days story project: Voices of the Vietnamese refugee experience. The Oral History Review, 43(1), 189–192.

Markkula, M., & Sormunen, E. (2000). End-user searching challenges indexing practices in the digital newspaper photo archive. Information Retrieval, 1(4), 259–285.

Quintana, Y., O'Brien, R., Patel, A., Becksfort, J., Shuler, A., Nambayan, A., Ogdon, D….Ribeiro, R. C. (2008). Cure4Kids: Research challenges in the design of a website for global education and collaboration. Information Design Journal (IDJ), 16(3), 243–249.

Rotermund, H., & Herzog, C. (2017). Archives of the digital. Interactions: Studies in Communication & Culture, 8(1), 3–7.

Sabin, J. E., & Harland, J. C. (2017). Professional ethics for digital age psychiatry: Boundaries, privacy, and communication. Current Psychiatry Reports, 19(9), 55.