Reaching Near–Real-Time Data Replication: Part 2

Apply a replication approach to typical data replication use cases for efficient migration

Replication technology can enhance a range of enterprise applications with expeditious data synchronization, and it can be applied in typical use-case scenarios. Part 1 of this series establishes an understanding of what replication technology can do for organizations that need to synchronize data across databases or locations while maintaining consistent availability and avoiding downtime. The use cases featured in this installment represent scenarios in which replication technologies can be highly useful and designed to deliver efficiency benefits.

Separation of workloads

Workload separation is a common data replication use case for organizations with high volume and high throughput online transaction processing (OLTP) database systems. In doing so, they often need to guarantee performance service-level agreements (SLAs) based on number of transactions within a given time period or elapsed time for each transaction. On the other hand, financial or ordering departments need real-time reports to make decisions in near-real time. However, reports can be very complex and often have a big impact on overall database management system (DBMS) performance. Therefore, isolating the reporting workload from the operational workload to help meet SLAs is a good practice. Reporting applications may either need copies of the complete schema or only a subset of the operational database tables.

Systems that generate reports frequently need data in near-real time because higher latency can have a significant impact on business decisions. For example, an order department may need to know how many parts of an article are still in stock and how many must be ordered to keep the utilization of a production line high.

A near–real-time, read-only copy of a complete schema or copies of dedicated tables can be easily achieved using advanced replication. This solution reads only the data changes from the database recovery log and replicates the changes—insert, update, and delete—to the target database. Other solutions may also achieve the same goal. For example, database triggers can insert changes to source tables into local staging tables, and another process—such as an extract, transform, and load (ETL) tool—can replicate these changes to the target database. But triggers are cost intensive, always synchronous, and have an effect on the source database system, particularly during high workloads.

Disaster recovery and high availability

Traditional high-availability architectures are often based on active-warm or active-cold standby systems that require idle hardware. These kinds of architectures can limit the impact of an outage, but they have some disadvantages. Warm or cold standby systems cannot be used for additional workloads, and they cannot balance workloads.

Also important is availability in case of a DBMS outage. Most active-warm—or cold—standby systems cover only hardware failures. But what happens if the database itself fails? This situation requires a disaster recovery of the database, and recovery time can take hours or days. This scenario can become critical for organizations that have to react quickly because outages often result in loss of money or reputation. However, bidirectional data replication is an acceptable solution to extend disaster recovery architectures because data is synchronized between two or more database nodes. Any changes to one database are replicated to the other, and vice versa.

The replicated databases are always active, and applications are connected to either database. Multi-tenancy workloads are well suited to leverage an active-active architecture because they are designed to prevent update conflicts. Another typical workload is the separation of read/write applications. If one database fails, applications can immediately connect to the remaining active database. In the meantime, the recovery process can be started for the failed database without any impact to the business. The failed database is synchronized back from the active database. Another big advantage of a disaster recovery solution based on replication technology is that the source and target systems do not require identical hardware or operating systems.

Provisioned data warehouse

Data replication techniques can ideally complement ETL tools to provision a data warehouse from relational sources. Replication solutions can provide several advantages:

  • The capability to stage relevant changes from source systems that result in low latency and minimal impact—for example, reporting frequency can be easily shifted from monthly to daily, from daily to hourly, or even to real time
  • The ability to contribute to data historization by optionally storing the complete history of changes, the change timestamp or valid-from date, the change operation, and so on
  • The capture of all changes, regardless of which application, process, or line-of-business user has performed them

Typically, replication is only one building block for data warehouse provisioning; other components such as the following are necessary to complete the architecture:

  • Replication solutions support only certain DBMSs or operating systems even though the IT landscape is usually heterogeneous.
  • ETL software solutions are usually required for complex transformations and data cleansing while replication cannot support this functionality or does so only in a limited way.
  • Semistructured or unstructured content that is located in file systems must have the capability to be loaded into the data warehouse.
  • New architectural components such as big data platforms are not yet fully covered as replication targets by all popular data replication providers.

The combination and integration of data replication and ETL offers a lot of advantages. The impact of extract processes on operational systems—exclusive access, batch windows, high processor cost, and so on—can be reduced significantly, and the timeliness of data and reports can be optimized. The often-used maxim, “time is money,” becomes even more relevant in today’s business environment than ever because organizations require making business decisions in real time.

Migration with zero downtime

Traditional data migration procedures usually require long batch windows and system outages during cutover. These requirements are not cost-effective for organizations that need to operate 24/7. Nonetheless, data and database migrations are unavoidable, and traditional data migration procedures cause them to often be time-consuming and expensive. There are several common reasons for data migrations:

  • Moving data and applications to more powerful hardware than the hardware on which they are currently deployed
  • Changing the DBMS for enhanced performance or functionality
  • Reaching the end-of-service date for the current hardware or for the currently used version of the DBMS
  • Deciding to transfer and consolidate different relational database management systems (RDBMSs) into a single and homogenous system
  • Switching the database codepagenote: unicode support for databases is important for multicultural initiatives, so a switch to unicode typically means migrating all existing data to a new database.

Replication technology helps minimize the cutover time and the migration effort significantly. The major advantage of a replication-based migration is that the old system can still be active while the replication processes keep the new system in sync. The new system can be tested with a current data basis. The final switch of the applications from the old to the new database system can be very smooth with minimal downtime. Advanced data replication solutions support different hardware, different operating systems, different DBMS versions, a different physical storage layout of the database schema—including the introduction of new DBMS features such as range partitioning—and even different DBMS vendor systems.

Replication-empowered data synchronization

Replication technologies can be an essential method to support real-time and 24/7 requirements in today’s complex IT world. Replication solutions can move data between database systems, help separate workloads, enable high-availability solutions, and offer migration approaches that help avoid downtime. While these use cases tend to be more common than others, many other scenarios exist that can benefit from advanced replication technology.

Please share any thoughts or questions in the comments.

[followbutton username='federator3' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']