The Difference Between Data Backup and Data Archiving, and Why It Matters

Data is your organization’s most valuable asset — it’s paramount that you protect it.

If you are faced with large amounts of precious data which you will need to access in the future, it is imperative that you understand the importance of a digital archive and not relying solely on a data backup.

Throughout this blog post, I’ll explain the high-level differences between an archive and a backup, discuss the importance of treating data differently, share a real-life example of data loss, explore the options for creating a digital archive and share the questions that you can ask solution providers to ensure they meet your archiving needs.

Storage backup versus digital archiving

When organizations create a long-term strategy for the archival of data, mistakes tend to occur when an individual involved in the process does not fully comprehend the differences between a backup and a digital archive. Assumptions that a backup will match the functionality of an archive are false and could result in data loss.

Fundamentally, a data backup is where a snapshot of your data is taken and stored elsewhere so that, if required, you are able to recover this data from a certain point in time. Typically, this recovery method is used in disaster situations, such as an incident of data deletion or corruption.

Typically, daily backups are kept for a certain amount of days, which are then replaced by weekly backups, which are then replaced by monthly backups. Businesses usually keep a number of daily and weekly backups at one time which are replaced on a continual basis.

But when a disaster strikes, data can be lost in a backup system. Therefore, it is important to think about your recovery point objectives and recovery time objectives — we’ll cover this in more detail in the next section of the blog. Organizations must also complete ongoing checks and tests of their backups to ensure that they are not exposing themselves to ransomware, a malware that employs encryption to hold a victim’s information at ransom.

A data archive is an ongoing, managed environment that focuses on the preservation of your data. It goes beyond creating a copy and instead focuses on the accessibility and re-use of your data long into the future.

Relying on a backup for the purposes of archiving your data for a long period of time does not protect against data loss, corruption or from formats becoming obsolete. This is before taking into account whether the data being stored in such a way that it can be easily searched and retrieved if required. Examples of how this is managed within an archive can include leveraging metadata to improve searchability and maintaining preservation copies of files so they can be used and read long into the future.

Leaving your data in a live system long-term with backups running on a continual basis can lead to a lot of data being stored, opening you up to risks of data loss, excessive costs or data deletion.

The build-up of this data can also cause issues around data portability, a concept where users have their data stored in silos that are incompatible with one another thus subjecting them to vendor lock-in, once again opening your business to further costs.

Treating data differently

Another area that can cause real issue is when individuals involved in designing your archiving and backup strategy treat all data in the same way.

When it comes to your data, there will be varying levels of retention and required accessibility; for example, you may have files with a 25-year retention, such as the eTMF in the life sciences industry, while other documents such as financial statements may require a seven-year retention. At this stage, not all data may need to be managed in an archive. Some documents, such as marketing materials, you may be comfortable leaving in your live site.

When deciding on the best strategy for your organization’s data, you’ll need to think about your Recovery Point Objective (RPO) and Recovery Time Objective (RTO). RPO defines the maximum allowable amount of lost data measured in time from a failure occurrence to the last valid backup. RTO is the downtime and refers to how long it takes to restore from the incident until normal operations are available to users.

Selecting a good backup provider is important. Selecting a substandard provider or one without the right safeguards in place can have a detrimental effect on your business. Remember that a backup tends to be operational while an archive focuses on long-term retention. Your archive and backup should work hand in hand.

To build the most effective strategy, you should ask yourself:

What is the purpose of my data?
Does it need to be maintained in an operational state, in a “live” production environment, or does it need to be archived to preserve the data, years into the future?
What are my recovery point objectives (RPO) and recovery time objectives (RTO)?
What are my legal or regulatory requirements for retaining some/all of my data?

Once you have determined which data you require long-term access to, you should move this into an archive. That way you ensure the data’s preservation, rather than depending on a backup which can’t always guarantee the files’ accessibility long-term.

A data archive is an ongoing, managed environment that focuses on the preservation of your data. It goes beyond creating a copy and instead, focuses on the accessibility and re-use of your data — long into the future.

The real effects of data loss

Typically, backup technologies have a high reliability rate for recovery. When you solely rely on an unverified or substandard backup provider and something goes wrong, it can have detrimental effects on a business. These could vary from reputational damage, fines, or the requirement to repeat previous business activities to retrieve the data that has been lost.

Memorial University, Canada

In 2016, the staff at the Queen Elizabeth II Library, Memorial University were undertaking routine maintenance which required power to the building to be cut and switched to a backup system. During this process, the backup system failed and more than 70 terabytes of data were lost during this event.

Luckily for the university, physical documents and objects still existed but they had to begin the process of digitizing these collections again, which can be very expensive and time-consuming. As you can see in this example, the backup failed to protect the library’s data.

Incorporating digital preservation strategies into your organization and combining the creation of a long-term archive with a backup for your data can assist in alleviating the risk of data loss.

What are my options for building a digital archive?

Now that we have outlined the difference between a backup and an archive and looked at an example of data loss, you may be starting to think about your next steps and options for creating a data archive.

For all industries, when it comes to creating a data archive you have two options — building the archive in-house or working with a third-party solution provider.

When looking at new technology, IT engineers sometimes debate whether to build their solution in-house. The main benefit of this is the opportunity to own and build the solution around your organization’s specific needs, offering more flexibility. However, this takes significant time, effort and skills because of the specialist nature of corresponding workflows, systems and integrations. The Total Cost of Ownership (TCO) of on-premise solutions, especially over time, is significant.

Furthermore, organizations risk having periods where they don’t have the requisite resources to sustain this approach, due to loss of staff or fluctuations in funding, for example. Even large and well-founded institutions including national libraries or international research facilities are increasingly choosing to opt for cloud-hosted solutions.

To assist in the buy vs build decision, ask yourself: does your IT team have the capability to manage and access years and years of data, long into the future?

Choosing to outsource your solution could reduce that burden for them.

Questions to ask solution providers

Making the decision on whether to build your archiving solution in-house or work with a third-party isn’t always easy. If you find yourself in this predicament, we’d suggest reaching out to third parties initially so you can understand the capabilities of their solution and compare this with your internal capabilities.

Outlined below are a list of questions and talking points we’d suggest you use when speaking to third parties so that you can better understand their solution and their business, and how this will align with your company’s data archiving requirements.

What do you use to back up your system?
What protections do you have in place to stop data from being tampered with?
How can you meet industry regulations?
Does your solution enforce best practice approaches for good data management, such as FAIR and ALCOA+?
Is your business ISO 9001 and ISO 27001 certified?
Will you store my data in multiple geographic locations?
Do you have an exit strategy?

Arkivum is recognized internationally for expertise in the archiving and digital preservation of valuable data and digitized assets in large volumes and multiple formats. Learn more about the EBSCO/Arkivum partnership.

Learn more