Blogs

Netezza and IBM Cloud Pak: A knockout combo for tough data

Offering Manager, Hybrid Data Management, IBM

A colleague recently shared a great quote with me from a mainframe CTO expounding on which platform is the “blue ribbon” winner for managing data across mainframe, IBM i, UNIX and Windows.

“While everyone scurried around to figure out what platform won,
a clear victor emerged: data”

It makes sense; there is no single landing spot for data. Systems, servers and platforms have come into existence for decades with attempts to usurp the current technology of the day. Companies pursuing these latest technologies with an all-or-nothing approach have many times found themselves spending far more than was originally anticipated in their idealized plans and retreating back to a middle ground where both the old and new technology are leveraged.

In contrast, data growth shows no sign of stopping. The ongoing explosion of data will result in approximately 40 trillion gigabytes (or 40 zettabytes) in 2020 alone. That’s 1.7MB of data created every second for every person on earth.

Companies typically then move copies of that data to a final landing spot for access. 60 percent of all storage goes toward data copies, leading to a hefty price tag of $55 billion dollars per year in total. Of all the data created, only 15 percent is original, and the other 85 percent is derived by copying that data for various uses in the organization. The data generated by the 5.8 billion enterprise and automotive IoT endpoints that will be in use in 2020 will continue to magnify the current data growth problem, as additional copies are made. This makes it even more likely that quality and trust of data will be compromised.

However, companies can overcome this challenge by looking at their information architecture in a different way: not as a grouping of data repositories but part of an inherently interconnected system. And while moving data may still be preferable in some unique instances, eliminating data movement for day-to-day requests from data citizens across the business can help reduce the amount of copies, potential for mistakes, and storage costs.

A flexible solution built on a cloud native platform

Companies need hybrid, cloud-ready options which allow heterogenous data sources to be accessed seamlessly across cloud and on-premises and provides flexibility of choice to meet organizations wherever they might be on their Journey to AI.   Solutions built on cloud-native platforms, such as Netezza Performance Server on IBM Cloud Pak for Data, can help with:

  • Modernization and digital transformation
  • Migrating or bridging to the Cloud
  • Driving operational efficiency
  • Accessing data in near real-time
  • Improving the use of analytics, machine learning and artificial intelligence.

These initiatives are important to those closest to the warehouse such as data engineers and DBAs tasked with taking business critical data and creating a consolidated view of it for the warehouse. They must have a reliable platform. An ideal platform would make use of existing investments, current skills, and allow flexibility and ease of use where the business benefits. Using existing investments in this context also means upgrading without significant migration costs or delays and few if any application changes. A solution like Netezza Performance Server for Cloud Pak for Data meets these needs by allowing these data engineers and DBAs to stay true to the technology they want without significant changes to their daily routine, while gaining added extensibility through things like data virtualization. In other words, they can combine and blend data from across the whole organization while maintaining the warehousing power they like.

Netezza and IBM Cloud Pak delivers extensibility

Netezza Performance Server and Cloud Pak for Data delivers a 1-2 punch, providing the best of each as part of a cloud-native environment. The Netezza warehouse delivers the power, simplicity and horizontal scalability, which businesses have praised for years, built-in analytics algorithms, a Spark subsystem and containerized deployment.

IBM Cloud Pak for Data also comes with an amazing stack of data services for collecting, organizing and analyzing data regardless of where it lives. Its range of capabilities include:

  • Asset management and classification
  • Data virtualization
  • Business glossaries
  • Business policies
  • Data governance
  • Data quality
  • Data lineage.

These services can be leveraged according to each business’s unique needs to form what analysts call a data fabric. In other words, a company can provide a single view across all data while maintaining sound governance practices such as controlling who sees what data. In addition, Cloud Pak for Data can automatically discover and securely connect devices and data stores for an added layer of protection.

Benefits of the data fabric

Netezza Performance Server leverages IBM Cloud Pak for Data’s data virtualization data service to allow real-time access to persisted data. Operating on data where it lives eliminates the need for copying or duplicating data and offers a significant complement to traditional ETL workflows, while eliminating associated data quality and data latency. Data Virtualization on IBM Cloud Pak for Data leverages a powerful distributed computational mesh compute grid that allows for optimal processing at the source, while delivering results fast. IBM Netezza Performance Server can help organizations access a wide range of relational, noSQL, non-relational, files and mainframe sources. Other popular sources over Hadoop, Social, Web and IoT are also in reach, regardless of whether they are deployed on-premises, on private cloud or on public cloud.

 

 

 

 

 

 

 

 

 

 

 

 

 

Easily view data assets using automated data discovery.

The BSFI industry is tied closely to Netezza and its ability to deliver optimized analytics over critical business data. Access to various systems of record that help this industry meets compliance and regulatory standards is critical. Data Virtualization will bridge this gap and allow the ability to meet Risk Mitigation, GDPR , PHI and PII standards. Organizations will be able to search and discover insights using a simple point & click interface. Data virtualization also works in coordination with IBM Watson Knowledge Catalog to heighten security by controlling who accesses what data asset and what they actually see. This is accomplished through a combination of policy management, cache management and data masking.

 

 

 

 

 

 

 

 

 

 

 

Establish data caching for specific workloads to improve elapsed time for moderately static data.

The data fabric for Netezza Performance Server on IBM Cloud Pak for Data also helps improve the ability to turn insights into action. The platform allows for a project-based approach, which allows businesses to build machine learning models as they progress on their Journey to AI. Moreover, the solution not only gives organizations a platform that delivers trust, agility and flexibility, it offers a wealth of analytics services through Watson ML and access of modern languages like Ruby, Python, R Studio and use of Jupiter Notebooks and Spark SQL to uncover new opportunity and help shift the business toward better outcomes

 

 

 

 

 

 

 

 

 

 

 

 

 

Leverage the Data Refinery to further prepare virtual data sets for use by Analytics and ML models.

The next step for forward looking organizations

Companies armed with this powerful combination are in a position to support their data-driven ideals, transforming the business to pivot and shift as needed. The ability to have business agility should not be underestimated.

Businesses are pushing for more insights, which demands an approach that can enable businesses to drive strategy, not just tactics. IoT, edge computing, and 5g will be an opportunity, not a problem because they have an information architecture and set of consistent data services that can handle current and future demands.

Learn more about data virtualization’s ability to enable real-time analytics without data movement, duplication, or added storage requirements. You can also see how Netezza Performance Server on IBM Cloud Pak for Data can help you save money and improve your ability to generate insight by reading this infographic or scheduling a free consultation with one of our Netezza experts.

Accelerate your journey to AI.