IBM Cloud Pak for Data ready to support DataOps’ practices

Portfolio Product Marketing Manager, DataOps, IBM

The number of business segments requiring data to drive contextual insights is increasing. Leaders are seeking new ways to manage the pressures of delivering high-quality data faster across their  businesses. To date, many of these projects have focused solely on ingesting data into a data lake which has led to repositories of uncleansed and ungoverned data. This effect has created data that is very hard to use. If you are being evaluated on how well the enterprise monetizes data and how it uses it to transform business models, then you recognize that this is an issue that needs to resolved.

One of the biggest limitations to success is failing to address people, process and technology together as a whole. Businesses thrive only when they are able to respond to all three in tandem, in order to drive an efficient, self-service data culture. Providing high-quality data to the right people allow them to implement improved processes and drive better business decisions.

A DataOps practice orchestrates people, process, and technology to deliver continuous, trusted, high-quality data to all data citizens. The practice helps you drive collaboration across your business to drive agility, speed, and new initiatives at scale. Automation is core to the practice, so as you look to tooling to support your approach, ensure that it works to remove bottlenecks in your data operations.

IBM Cloud Pak for Data, built on open standards, provides agility by integrating and enabling interoperability for current and future business requirements. It virtualizes data access to simplify data use by data citizens and accelerates the process of applying data to analytics and AI—getting you closer to business insights, faster.

Cloud Pak for Data brings together all the critical cloud, data and automation capabilities as containerized microservices to deliver the core technology to support DataOps within one multicloud platform. The latest innovations around Cloud Pak for Data seek to alleviate three challenges all data leaders are facing today.

Challenge No. 1: Delivery of timely data-driven insights to business for growth and innovation

Cloud Pak for Data features enhanced Watson Knowledge Catalog capability by integrating data quality and data governance features with the existing consumption features of the data catalog. New data quality and data governance features help you inventory, cleanse, monitor and leverage information for timely and confident business decisions.

The new, centralized reference data management process helps data citizens to record and make known the permissible values of an entity. The new concept workflow provides transparency on progress and each collaborator's activity in real-time, allows for better collaboration with other users across the company. We made user experience and design enhancements to improve authorship of terms, data classes when defining technical metadata for automatic profiling, classifications to allow users to more deeply describe data, and with policies and rules, enable users to describe governance activities around data.

Challenge No. 2: Improving data use by discovering and integrating it from multiple silos

To address the challenge of managing the number of containerized applications across different operating systems, you need a robust open source tool such as Red Hat Openshift. This platform helps you scale and provision containers to support key IT initiatives such as microservices and cloud migration strategies. Specifically, your ETL tool needs to create a CI/CD pipeline by supporting source control tools such as Github to frequently publish jobs and release to production.

The improved DataStage capability can deploy integration components through containers on any cloud environment to reduce latency due to large data volumes. DataStage can help you strengthen performance with a variety of prebuilt functions and connectors to address the challenge of data sprawl and data in different formats. ETL (extract, transform, load) is the backbone for transforming your data and integrating it across multiple clouds. Ideally, you would write a job only once and not every  time for different environments. By making the DataStage capability available on Cloud Pak for Data, machine learning-based capabilities can assist users (even non-technical ones) to build flows and stages within a job.

Challenge No. 3: Identification and reduction of risk and compliance concerns stemming from data

The addition of the IBM Infosphere Regulatory Accelerator (IIRA) capability helps simplify the complex terminology and verbiage of data privacy and industry specific regulations including Current Expected Current Losses (CECL). Powered by machine learning, you can accelerate the understanding of these regulations for their contexts, policies, rules and consequences as they apply to your business.

Using the IIRA capability, data users can also cross reference regulatory terms to business glossary terms and data assets using auto discovery and classification capabilities. The machine learning capabilities can fine-tune algorithms to suggest increasingly accurate matches which can then be used to assign business terms. Additionally, the capability brings opportunities to automate taxonomy generation from PDF files, auto glossary generation from metadata and sensitive data identification.

Learn more about DataOps and Cloud Pak for Data

As DataOps empowers organizations to successfully deliver trusted, high-quality data across the organization, the need to embrace a platform that supports data governance, integration and quality is crucial. Continue learning about how data governance and quality benefit your business by watching this webinar.

Learn more about the power of Cloud Pak for Data. And learn how to implement DataOps to deliver a business-ready data pipeline.