Why hybrid cloud environments require live data replication technology

Product Marketing Manager for Data Lake & Cloudera Partnership, IBM

The best decisions are made by extracting value from all the disparate data across your business. Yet aggregating data across external sources, regional silos and various forms of storage is not an easy challenge to solve.

Data-powered businesses need always-on access to data to keep operations moving smoothly and to stay competitive. As data infrastructures grow more distributed, many companies rely on complex hybrid and cloud environments, often from a combination of vendors and platforms across multiple regions. Companies with big data require solutions for continuous availability and data consistency in a mixed, multicloud IT environment.

A live data replication strategy can bring together siloed data while ensuring accuracy, global accessibility and consistency. This is increasingly critical to manage across multiregion data platform architectures that span on-premises and several private and public clouds from different vendors.

Benefits of live data replication

Data movement. Live data replication can simplify processes and decrease risk for data lake ingestion. It is an optimal solution when migrating complex workloads—whether they are between data lake platforms that are both on premises, or between on-premises and hybrid- or multicloud environments.

A good example comes from the recently-finalized merger of Cloudera and Hortonworks. Hortonworks customers interested in Cloudera products will need technology that facilitates migration and vice-versa. In addition, as Cloudera and Hortonworks bring their systems together, customers will need the ability to move to new “unity” products without downtime and the financial repercussions that can bring. A live data replication strategy makes managing these types of migrations both safer and simpler.

Disaster recovery (DR), high availability (HA) and data governance. The data replication, backup and recovery tools provided by Hadoop distribution vendors were primarily designed for copying files, not for consistent replication of an entire cluster for backup and recovery. Live data replication for DR, HA and data governance can provide fully-functional copies of data, accessible from anywhere to anyone with complete control over data consistency and location. This can help you satisfy strict regulatory requirements.

Analytics, data discovery and experimentation. Using a live data replication strategy enables real-time advanced analytics to be offloaded to the cloud quickly and easily. In addition, experimentation and exploration using new cloud services can be performed on replicated data to alleviate concerns about breaking existing workloads. And no matter where data is used for analysis, whether in traditional business intelligence systems, analytic environments, or experimental machine learning models, live data replication helps ensure data consistency, security and minimize downtime.

What to look for in a live data replication solution

When selecting a software platform that keeps data consistency in a distributed environment, on premises and in the hybrid-cloud, make sure it can supply enterprise-level functionality. A high-performance coordination engine that uses consensus to keep unstructured data accessible, accurate, and consistent across deployments is vital.

Businesses also require live data replication solutions that can deliver:

  • A migration with zero disruptions. To avoid costly downtime, select an offering that has virtually zero recovery point objective (RPO) and recovery time objective (RTO). This will help datasets remain secure and available for use, even during petabyte-scale migrations. Multidirectional replication is also essential so that data can be moved between all necessary repositories, whether on premises or in the cloud, including Hadoop clusters – even those with different distributions – SQL databases and NoSQL databases.
  • Improved backups with integration and location options. Look for live data replication offerings that reduce latency and potential points of failure by moving from siloed to integrated data architectures. Also make sure backup data can be placed where you need it most, including offsite in the cloud to reduce on-premises infrastructure costs and time spent purchasing and configuring hardware and software. You should also be able to select cloud object stores, such as S3 and IBM Cloud object stores, as the repository for your backup data.
  • Support for IoT and edge devices. Given the rise of connected devices, the solution you choose should deliver the ability to support them. Live data replication offerings should use a multi-platform proxy server architecture that provides coalesced and aggregated data for multi-location or multi-source IoT and edge devices. More frequent batch-to-data-warehouse analytics to IoT ensures better data accuracy.

One solution that fulfills these criteria and delivers all the benefits is IBM Big Replicate. This live data replication solution is great not only for your data migration and disaster recovery needs. And as a partner to Hortonworks and Cloudera, it can also assist with your architecture changes during the Cloudera-Hortonworks merger. Learn more about IBM Big Replicate today.