Data governance: The importance of a modern machine learning knowledge catalog
Deliver business ready data with end-to-end governance, quality, consumption, and automation capabilities with IBM Watson Knowledge Catalog
IBM Watson Knowledge Catalog (WKC) provides a modern machine learning (ML) catalog for data discovery, data cataloging, data quality, and data governance. Within this framework lies a central Knowledge Catalog that serves as a single source of truth for data engineers, data stewards, data scientists, and business analysts to gain self-service access to enterprise data they can trust and use with confidence. With incorporated DataOps automation and methodologies, Watson Knowledge Catalog enables organizations to make quick and informed business decisions and quickly deliver business-ready data while maximizing the ROI of their data governance initiatives.
The importance of delivering business ready data with a modern end-to-end machine learning knowledge catalog
The Gartner Group's research shows that a vast majority of organizational approaches struggle to achieve business-ready data due to a lack of an end-to-end fully integrated platform. IBM Watson Knowledge Catalog Powered by the Cloud Pak for Data provides a fully integrated platform for Data Integration, Quality, Governance, and Consumption, without requiring added overhead cost and technical debt of third-party integrations. Forrester's Total Economic Impact of IBM Cloud Pak for Data found that the Cloud Pak for Data single integrated platform reduced infrastructure management effort 65 percent to 85 percent. Furthermore, its integrated data management and science solutions contributed to Data Science, ML, and AI benefits totaling $1.2 million to $3.4million. IBM Watson Knowledge Catalog Powered by Cloud Pak for Data provides an end-to-end business ready foundation that prepares organizations to address complicated data lake challenges and scale quickly while delivering meaningful and trusted data of quality fast.
Best practices for delivering an end-to-end business ready foundation
1. Establish a Robust Business Taxonomy
Organizations must be able to allow a foundation for content categorization, data relationships and provide a guideline that improves the speed at which data can be found, accessed, or reused while increasing efficiency of data stewardship and collaboration across stakeholders.
- Focus on a single high-value information area: Focus on a particular segment of the business that will drive the most significant impact. For instance, if GDPR and CCPA compliance is a high priority for your organization, begin with establishing terms and classifying assets related to personally identifiable information (PII).
- Concentrate on the meaning of business definitions: Use the language of your industry in the form of logical or business intelligence models to power existing terms and standards already set in place.
- Establish benefit and gain interest: Communicate to your organization to understand the advantage of having a single source of truth where all information is stored.
- Develop and commit to milestones: Establish official milestones that your organization will commit to for the implementation of the business categories, business terms, and correct assignment of user roles—and the data catalog process.
Quick Tip: Don’t know where to start or need help with implementation?
The IBM Data and AI Expert Labs is a dedicated team of over 1,000 experts that will work with your organization every step of the way. Experts can help determine where clients are in their Data Governance and AI Journey, then help them quickly install and deploy their AI initiatives.
2. Know your data: Enable governance teams to meet ever changing demands
When establishing a data governance program, organizations need to facilitate data governance tools and create decision rights to comply with regulatory requirements, communicate and enforce policies and standards, and incorporate metadata management guidelines for data security. The management of data must be complete, applicable, and accessible everywhere.
How Watson Knowledge Catalog delivers
- Business Glossary: Define and ensure common terminology is used across the organization, to have a unified understanding of the business. If assistance is needed, WKC Knowledge Accelerators provide out of the box Business Core Vocabulary with thousands of industry business terms, including standard definitions.
- Policy Management: Enable data privacy and define data policies to describe how the use of overall data, along with sensitive data and personal information needs to be handled and automated through data protection, data quality, and automation rules.
- Reference Data Management: Create centralized management of reference data and standardize common values used across applications and data assets.
- Classification: Describe the sensitivity of a whole data asset to help data citizens across the organization understand. You can use classifications to describe business terms, data classes, reference data sets, and governance rules.
- Data Lineage: Track your organization's data lifecycle and determine where it originated and how it is consumed, allowing for more trust and transparency across the organization.
Quick Tip: Need to streamline the process of complying with new regulations?
IBM WKC Regulatory Accelerator uses machine learning to extract key terms, definitions, policies, and controls from regulatory documents, then enable organizations to visualize the different initiatives that their team must undertake to comply with the regulations. It then creates a set of related business terms that organizations can import into an enterprise data catalog to govern the regulated data.
3. Trust your data: Assess the quality of your organization's data
Data must be secure, clean, and easy to find to encourage trusted self-service access. Data Citizens need to understand where the data came from and its quality.
How Watson Knowledge Catalog delivers
- Data Discovery: Automatically find, import, analyze, and catalog new data from different sources, making it easier to search for, govern, and use the data.
- Business Term Suggestions: Automatically assign business terms to technical assets while continuously training the ML model for more accurate future metadata enrichment.
- Data Profiling and Analysis: Automatically profile a data asset and generate metadata, statistics, and visualizations about the textual content of the data.
- Data Quality Issue Detection: Use Automation and data analytics to measure the quality of data with over 10 out of the box dimensions and receive an overall data quality score and updates to that quality score as it changes over time.
Quick Tip: Need to analyze unstructured data?
WKC InstaScan is an intelligent file analysis tool that leverages automation and statistical sampling models to identify risk hot spots in unstructured cloud data quickly. Today unstructured data makes up around 80 percent of all enterprise information shared on the cloud. The tool helps accelerate regulatory compliance and data governance as part of a DataOps practice.
4. Use your data: Consume and share data across the enterprise
Enterprises need to surface business-ready data to consumers allowing them to deliver timely value to the business, make better decisions, and improve productivity through faster model development and deployment.
How Watson Knowledge Catalog delivers
- Policy enforcement: Automatically enforce policies and mask data, so when a data asset is found, the individual knows they can use it, without compromising its security.
- Data preparation: The data refinery tool enables organizations to discover, cleanse, and transform data with built-in operations.
- Self-service: Drive self-service discovery and automate decision making to evolve the business, by providing a view of all information to those that need it and allowing them to access it.
- Collaboration: Collaborate among business units, users, and data owners by leaving comments or assigning a rating to an asset. Ensure corporate accountability by allowing data stewards to create, update, review, and approve assets while providing domain expertise to keep users informed of progress.
Quick Tip: Need to search and find data assets fast?
With recent search history, ML infused autocomplete search suggestions, and results based on relevancy scores, Watson Knowledge Catalog incorporates Smart Global Search across the Cloud Pak for Data platform.
Earlier this year, Watson Knowledge Catalog was recognized for the 2020 Gartner Customers’ Choice for Metadata Management Solutions.
Forrester's Total Economic Impact of IBM Cloud Pak for Data provides a deep dive into how Cloud Pak for Data and Watson Knowledge Catalog services are contributing ROI of up to 158 percent for your peers in the market.
To hear about the latest automated metadata generation capabilities in Watson Knowledge Catalog and how they have impacted the IBM Global Chief Data Office, watch the DataOps webinar.