Success criteria for data integration scalability

Product Strategy & Marketing - InfoSphere, IBM

Every day, torrents of data inundate IT organizations and overwhelm the business managers who must sift through it all to glean insights that help them grow revenues and optimize profits. Yet, after investing hundreds of millions of dollars into new enterprise resource planning (ERP), customer relationship management (CRM), master data management systems (MDM), business intelligence (BI) data warehousing systems or big data environments, many companies are still plagued with disconnected, dysfunctional data.

To meet the business imperative for enterprise integration and stay competitive, companies need to bring all their corporate data together, deliver it to users as quickly as possible to maximize its value. This is particularly true when dealing with big data projects  for which the most essential requirement is scalability and high performance data integration (to get big data into and out of Hadoop distributions). Organizations must take advantage of a fully scalable information integration architecture that supports any type of data integration technique such as ETL, ELT (also known as ETL Pushdown), data replication or data virtualization. Success criteria for architecture’s scalability include:

  • Massive data scalability: For integrating enterprise-class data volumes, massive data scalability (MDS), with the ability to dramatically reduce the amount of time it takes to handle various workloads, is a critical factor. An enterprise-class big data integration platform optimizes the usage of hardware resources, allowing the maximum amount of data to be processed and meets current and future performance requirements.
  • Linear performance improvements when adding hardware resources: Fully scalable data integration architecture delivers the capability to provide steady execution performance as data volumes and the number of processing nodes increase proportionally, or to manage the same workload faster with additional resources (n additional resources → n-times better performance).
  • Minimal non-hardware-related costs when the environment changes due to revised data characteristics or additional hardware resources: To achieve the best return on investment for your data integration project, adding processors or nodes to the hardware environment should occur with no change to the design of your data transformations, replication definitions and the end-to-end flow to avoid recompiling, retesting and deploying.  
  • Address current and future performance requirements: The scalable architecture must be dynamic to meet organizations’ current and future data integration performance requirements.

It is indeed a daunting task for any data integration platform or architecture to meet the success criteria defined above, and only the best-of-breed platforms or architectures can address all the challenges and deliver enterprise-class data integration capabilities.

Read this new whitepaper to learn about the seven essential elements needed to achieve the highest performance and scalability for data integration. Also read this blog which discusses big data integration in detail.

Related resources