Replication (computing)
Replication in computing refers to the process of duplicating databases and their elements across multiple servers, allowing for data accessibility and redundancy. This technique is essential for ensuring that users in different locations can access the same data concurrently while safeguarding against data loss due to hardware or software failures. Replication systems can be organized in various configurations, such as fan-in systems, fan-out systems, and multimaster systems.
In a fan-out system, a single master site distributes data to multiple snapshot sites, while a fan-in system collects data from several sources into one master site, facilitating data management and backup. Multimaster systems involve multiple master sites communicating and synchronizing updates, which enhances data availability even if one site fails.
Replication can occur synchronously, where updates are made in real-time, or asynchronously, where changes are logged and updated at set intervals. Each method has its advantages and challenges, particularly concerning network demands and the risk of data loss. Overall, database replication is a critical component in modern computing, supporting collaborative work environments and robust data management strategies.
On this Page
Replication (computing)
When used in computing, the term replication refers to the duplication of databases and database elements. Database replication is a complex process that involves copying data found on one server and distributing the copies to additional servers over a local area network (LAN) or a wide area network (WAN). Replication allows multiple people to have access to the same set of data and prevents data loss in the event of software or hardware failure. Replication systems may be configured into fan-in systems, fan-out systems, or multimaster systems.
The process of database replication involves several parts. Any piece of data stored in a database is called a database object. A database object simultaneously stored in multiple databases or locations is called a replication object. Replication objects related to each other are then sorted into replication groups. Any place a replication group exists is referred to as a replication site. A master site is any replication site that stores a full copy of the database and can modify the contained data. Sites that are not able to modify the data they receive are called snapshot sites.
Database Replication Systems
Database replication is the act of copying data stored on one server and sending it to another location. Replication systems are set up for many purposes. Most commonly, database replication allows people at several locations to read identical sets of data. In some cases, database replication allows several users to modify a single data set. This dataset then updates information held at many other shared locations. Finally, database replication may be used to back up data in the case of a hardware or software failure.
One of the most basic replication scenarios involves a single computer sharing information with another computer. For example, Computer A contains several replication groups. Someone on Computer B needs information from one of Computer A's replication groups. Once the two computers have been connected through LAN or WAN, specialized software sends a copy of the data from Computer A to Computer B. Whenever someone changes the data on Computer A, the software will send an updated copy to Computer B. Because Computer A stores the entire database and can make modifications to it, Computer A is considered a master site. Because Computer B cannot modify the data it receives, Computer B is a snapshot site.
Database replication becomes more complicated as more sites are added to the replication network. Two of the most commonly utilized configurations of multiple replication sites are fan-out systems and fan-in systems. In a fan-out system, one master site sends data to several snapshot sites. This configuration is used whenever up-to-date information is required at multiple locations, such as franchise stores that require frequently updated price catalogues. In a fan-in system, multiple computers send data to a single master site, which compiles all the data into a locally stored database.
Fan-in systems are used to collect information and store it in a single, easily accessed location. Additionally, such systems are used to back up important information. In this circumstance, many computers periodically send data to a secure server. The server then creates an organized database of all that data. If the contributing computers ever fail, the data they stored can be retrieved from the server.
Many replication networks contain several master sites. In these multimaster systems, all the master sites are constantly communicating. Whenever someone changes a replication object at one master site, the computer sends an updated version of the object to every other master site in the system. The other master systems receive the updated object and use it to replace the old object. By constantly maintaining this process, all the master sites in a multimaster system should have exactly the same files at any given time. Each of these master sites can still distribute files to several snapshot sites.
Multimaster systems are used in any situation where databases absolutely need to be available at any time. Even if one master site is rendered nonoperational, the other master sites in the system can be configured to automatically begin distributing updated replication groups to the nonoperational master group's snapshot sites. If the master sites in a multimaster system are stored in separate locations, it is extremely unlikely that the database will become completely inaccessible. Many multimaster systems are comprised of several smaller fan-in and fan-out systems.
Finally, data replication can take place synchronously or asynchronously. When a replication system is configured to replicate synchronously, the system is set to update any modified replication objects on other systems as soon as they are changed. Synchronous replication is useful because it eliminates any chance of data being lost before being shared with other systems. However, synchronous replication requires a powerful connection between computers and is very demanding on networks. The farther apart two computers are, the more difficult synchronous replication is to maintain. In contrast, when a replication system is set to update asynchronously, it does not update in real time. Instead, it keeps a log of any changes made at a master site and updates other sites with the changes at predetermined intervals. While asynchronous systems are far easier to maintain, they come with the possibility that data could be lost before being transferred to other replication sites.
Bibliography
Bradbury, Danny. "Remote Replication: Comparing Data Replication Methods." ComputerWeekly.com. TechTarget. Web. 29 Dec. 2014. http://www.computerweekly.com/feature/Remote-replication-Comparing-data-replication-methods. Accessed 21 Nov. 2024.
"Database Replication." UMBC Computer Science and Engineering. University of Maryland, Baltimore County. Web. 29 Dec. 2014. http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67781/c31repli.htm#12888. Accessed 21 Nov. 2024.
"Replication." Webopedia.com. Quinstreet Enterprise. Web. 16 May. 2022. http://www.webopedia.com/TERM/R/replication.html. Accessed 21 Nov. 2024.
"SQL Server Replication." Microsoft Ignite, 29 Sept. 2024, learn.microsoft.com/en-us/sql/relational-databases/replication/sql-server-replication?view=sql-server-ver16. Accessed 21 Nov. 2024.