A vision of hybrid cloud for big data and analytics

Distinguished Engineer, IBM
Executive Architect, IBM

In our strategic discussions with IT leaders and their C-level business counterparts, we hear time after time that they are focused on—or at least considering—either shifting existing workloads to the cloud, extending existing workloads to the cloud, or building new workloads on the cloud and integrating those with existing workloads.  For some, this discussion underscores the need for multispeed IT, a topic that IBM Analytics Group CTO Tim Vincent discusses in our Analytics InsightOut series on Big Data Hub.

Quite often, we see that the need for data security and governance makes some organizations hesitant about migrating to the cloud. This is perfectly understandable given the types of data gathered and used by businesses today, the regulations they must adhere to on both a local and global level, and the cost to maintain data and operational infrastructure. Fortunately, the business model for cloud technology is evolving to enable more businesses to deploy a hybrid cloud, particularly in the areas of big data and analytics. 

How we define hybrid cloud

We define the hybrid cloud as the connection of the private environment with one or more public cloud(s) as shown in Figure 1. It leverages the best of what each environment has to offer, providing the flexibility to locate data and services based on business need. Data can be located and accessed based on consumption patterns and analytical workload requirements within hybrid cloud environments, providing data and analytics for the different personas where it is needed. Access to all areas of the hybrid cloud environments are managed and controlled to uphold privacy, security and other data governance requirements. 

Figure 1: The Hybrid Cloud

The digital transformation requires a new hybrid cloud—one that’s open and flexible by design, and gives clients the freedom to choose and change environments, data and services as needed. This approach allows cloud apps and services to be rapidly composed using the best relevant data and insights available, while maintaining clear visibility, integrated control, governance and security everywhere. As shown in Figure 2, the majority of the Systems of Records usually resides on the private environment, while the Systems of Engagement and Systems of Automation are mostly on the public cloud(s) and the Systems of Insight span across all environments of the hybrid cloud. The flexibility and openness of the hybrid cloud allows the data and the associated analytical workload to be placed where it makes the most sense in terms of business needs. The information privacy and security is managed and controlled consistently across all the systems of the hybrid cloud environments.

Figure 2: Hybrid Cloud for the Digital Enterprise

Our definition of hybrid cloud is consistent with the majority of our clients who want to extend private environment to the public cloud. We believe that private environment is essential to the model because most businesses will always require portions of their data and infrastructure to remain behind the corporate firewall due to industry standards, local regulations, or their own attitudes toward controls. This creates an even more flexible architecture by giving businesses more freedom to choose and change their environments and deploy services and applications more quickly. 

We view hybrid cloud strategy as an overall architecture solution, and not just a migration path.  The goal is to allow you to extend workloads from a pure private environment model to a Hybrid model that couples private environment and public clouds. The strategy certainly can be used to direct your organization toward the cloud, but it can help you accomplish the integration of the environments in the hybrid cloud. 

Why hybrid cloud for big data and analytics?

A hybrid cloud allows different personas to work with data and analytics capabilities where it makes the most sense for them to do so and this helps to define the requirements where the data and analytics capabilities should be placed/available in the hybrid cloud environments. As a result, analytics workloads can run more efficiently wherever the data is stored.  

It’s important to have hybrid cloud as an option because location should be one of the first architectural decisions for any analytics project. In particular, organizations need to

consider where the data should be stored, and where the analytical processing should be located relative to the data. Meanwhile, legal and regulatory requirements also impact where data can be located, as many countries have data sovereignty laws that prevent data about individuals, finances and intellectual property from moving across country borders. 

Systems are going to have multiple centers of gravity which will dictate where processing will occur.   For example, if building a data lake as part of a Systems of Insight and the data that feeds the data lake is in the private environment then the center of gravity will be on the private environment and the processing of the data should stay within the private environment. But if the Systems of Insight starts including data born on the public cloud then there could be a second center of gravity.

Primary drivers for big data and analytics in the hybrid cloud

  • Integration: Organizations need to integrate data that is stored and managed in a hybrid environment across the private environment and public cloud(s). Typically, these organizations need to integrate Systems of Engagement and/or Systems of Automation (IOT) applications, such as social media, customer management systems, and devices, with Systems of Insight, such as predictive and real-time analytics hosted on public clouds, and mission-critical applications and data stored on servers in the private environment (Systems of Record). 
  • Brokerage/management for workload and resource optimization: Different workloads have different requirements for security, speed, resources and storage. Many organizations are driven to hybrid cloud because they want the option to place the data and the analytical workload where it makes the most sense based on the business requirements. These organizations want the ability to optimize cost, performance and agility, while also enjoying the flexibility to move data and analytical workloads between private environment and public cloud.  
  • Portability: Another major case for hybrid cloud is the need to ensure portability of analytical workloads and data. In order to manage costs and effectiveness, IT management needs to be able to move workloads and data to whatever platform best meets changing customer demands.  This capability requires IT to consider the feasibility of the new analytical workload and data on a specific hybrid cloud environment based on the overall hybrid cloud architecture. 
  • Compliance: A hybrid cloud allows for distributing global applications, data and workloads across geographically dispersed private environment and public cloud(s) where the requirements for data sovereignty, compliance, privacy, identity management, and data protection could imply for the data and consequently the workload to be placed on a specific environment in a specific country. An organization can choose to deploy cloud environments that are already compliant with regulatory requirements (such as HIPAA, PCI, and SOX), and are located in a specific country to comply with local privacy and data sovereignty laws. 
  • in a specific country. An organization can choose to deploy cloud environments that are already compliant with regulatory requirements (such as HIPAA, PCI, and SOX), and are located in a specific country to comply with local privacy and data sovereignty laws. 

Considerations for implementing a hybrid cloud strategy for big data and analytics

The key considerations for implementing a hybrid cloud strategy include: 

  • Cultural shift: One of the biggest challenges in moving to a hybrid cloud is establishing and promoting a collaborative, service-oriented approach for provisioning data and analytics and self-service capabilities to be able to extend private environment to the public cloud.
  • Varying levels of hybrid sophistication: A hybrid cloud strategy can have different levels of sophistication: deep integration between private environment and public cloud(s), or more simplistic, static, point-to-point connections using a virtual private network (VPN), a secured gateway, and an API manager designed to expose systems of record data (private environment) to systems of engagement (public cloud).   

Our vision for the future of hybrid cloud

Hybrid cloud requires a new approach for both IT and the business. The goal for an organization as a whole is to extend the private environment investments to the cloud, to modernize some of the private environment applications to the cloud and to provide a seamless hybrid experience taking in consideration the different aspects already presented before—what we at IBM refer to as the North Star. The North Star delivers a consistent experience across private environment and public cloud. To help our customers achieve this, IBM delivers a comprehensive strategy for all fundamental areas of the hybrid cloud infrastructure.

  • Data and analytics: Provide new tools that help users generate new insights with minimal programming with a plug-and-play approach and common services for all kinds of databases (hybrid cloud data lake), thus enabling self-service analytics, multispeed IT, and collaboration among different personas. 
  • Data movement and replication: Provide new options for data movement, replication and sharing that improve network bandwidth and minimize latency, such as dedicated connections. 
  • Data preparation and integration: Deliver new tools with distributed computing that allows push-down of data transformation, integration and analytics processes to the data, considering where the data resides, at rest or in motion, for digital, mobile and IOT. Automation is needed to ingest and persist data and generate metadata at the speed the business needs. Also, an easy integration (plug-and-play) is critical among different cloud components, including storage, computing engines, application runtimes, frameworks, services, applications and APIs.
  • Data sovereignty and compliance: Implement improved cloud environments that are compliant with all industry and government regulations. Add more cloud environments across the globe to allow better management of data sovereignty constraints.
  • Data governance and security: Improve security with full managed access, full data protection, full visibility of security risks, and optimized security operations across private environment and public cloud.
  • High availability and disaster recovery: Enable full HA/DR, backup and archive capabilities for every component that requires them, and on every environment of the hybrid cloud.
  • Network configuration and latency: Implement an easy network configuration based on software defined networking (SDN) and improved network latency based on location optimization, dedicated connections, cloud cache, network traffic optimization, and workload optimization.
  • Workload portability: Use containers to make workloads more portable among private environment and different public cloud providers, thus providing a cloud-agnostic application development capability that allows quick and easy deployment.

For more information about the Analytics Group CTO Office's technical strategy for Hybrid Cloud, download our IBM whitepaper, Hybrid Cloud for Big Data & Analytics Solutions

Check out the Analytics Group CTO Office's blog series on IBM Big Data Hub, Analytics InsightOut.  Read blogs and listen to podcasts from our experts, including Analytics Group CTO and IBM Fellow Tim Vincent, on key analytics strategy topics.