July 8, 2019 By Emma Tucker 4 min read

Shopping for an enterprise data catalog can resemble shopping for a car. You know that all cars will get you from point A to point B, but you still must decide how you want to get there. Do you want a practical truck that will carry all of your possessions? Do you want a small sports car that will zip you there? Do you want a larger car that will fit your family and dog in the same trip? And what will you want five years from now?

Like car shopping, when looking for a data catalog, you’ll need to ask yourself what is most important. Unlike car shopping, you can’t sign a year-to-year lease and easily migrate to a new data catalog on a whim. So you must make a decision with the future in mind.

The abundance of varieties of data catalogs make it difficult to zero-in on the exact products that will meet your needs and have the potential to grow as your data governance and analytics initiatives grow. At IBM, we deem these five data catalog capabilities as critical to deliver business-ready data to your data citizens.

1. Search experience that enables your data citizens to shop for and consume data

Just like a car gets you from point A to point B, a functioning data catalog organizes your data so you can quickly find and consume it. Therefore, the most critical component of a data catalog is the ability to provide self-service access to data to your data citizens.

When looking at data catalogs, ask for a demo of the search experience and how it enables data citizens to shop for data. You will want something effectively as easy as the Netflix search experience. One where you can not only search based on categories and folders with a search bar, but also one where you see intelligent recommendations based on what’s most important to you and your organization. You’ll want to see data that was highly-rated by your peers and know which data sets you might want to avoid or cleanse before using for your data science initiatives. Once you find the data, you’ll want to see how to consume it for building reports, analytics projects and models.

2. Automated compliance that helps you protect your data

Data catalogs not only organize your data, they also help you comply with regulations and company policies. Masking and restricting data takes time. With a quality data catalog, data published to the catalog will automatically comply with changing regulations.

When looking at data catalogs, ask for a demo of technical metadata generation and how to build and enforce rules and policies. A machine learning powered data catalog will automatically profile data assets published to a catalog, determine how to classify each column, automatically enrich the metadata with business terminology and then enforce pre-written rules for masking or restricting access depending on how the data is classified. This automates steps that would normally be extremely arduous and manual.

3. Connections that help you connect with data spread across disparate sources

Data catalogs are only as powerful as the connections they offer. If you want a true enterprise data catalog, you’ll want to find one with connections to all of your data sources whether it stores structured, semi-structured or unstructured data. Otherwise, you will not actually see all of the data kept by your organization, and you will continue to have silos and gaps.

You should ask for a list of connections available with the data catalog as well as plans for additions in the future. You’ll want to confirm that the provider is continually building out their ecosystem of data sources, so it grows as your data sources grow. Also look at deployment options for the data catalog to make sure you can deploy your catalog where your data resides—whether you’re in a public, private, hybrid, or multicloud environment.

4. Quality and governance that helps your data governance teams

Analytics reports and data science models reflect the effort you put into data quality and data governance. If you cannot trust the data you use for analysis, you cannot trust your reports or data science models. If you do not have data quality or data governance programs in place, that’s the first place that you need to start.

It’s important that the tools you use for those programs integrate seamlessly with your data catalog, or your efforts to deliver trusted, business-ready data will be unsuccessful. A data catalog that integrates with data quality and governance tools like data quality rules, business glossary and workflow means a seamless platform that will grow with you as you create and deliver trusted data. Ask to see a demo of how a data catalog will support your data governance and data quality needs, as well as enhancing the output of these initiatives, so your data citizens and data scientists know that they’re not creating reports and models with bad data.

5. Governance for AI

Odds are that your company either has a data science and AI team or plans to create one very soon. AI is the next big disruptor. According to Gartner, by 2022, every personalized interaction between users and applications or devices will be adaptive. That means that companies will use AI to build a customer-centric user experience. If you do not keep your AI initiatives in mind when shopping for a data catalog, then you’ll find yourself shopping for a new one in the next couple of years. Data governance teams will soon be responsible for managing AI models—understanding the data used, explaining its results, governing usage, and regulating bias. Ask for a demo of not only how a data scientist can find data in the data catalog, prep it for AI, then start building models with it, but also how the catalog can help the enterprise governance program grow to support the maturing demands of AI governance.

With these five capabilities in mind, you’ll find narrowing down your search much easier.

Learn more about the IBM enterprise data catalog, Watson Knowledge Catalog, and read about how IBM leads in Forrester’s report on machine learning data catalogs.

Was this article helpful?
YesNo

More from Analytics

How the Recording Academy uses IBM watsonx to enhance the fan experience at the GRAMMYs®

3 min read - Through the GRAMMYs®, the Recording Academy® seeks to recognize excellence in the recording arts and sciences and ensure that music remains an indelible part of our culture. When the world’s top recording stars cross the red carpet at the 66th Annual GRAMMY Awards, IBM will be there once again. This year, the business challenge facing the GRAMMYs paralleled those of other iconic cultural sports and entertainment events: in today’s highly fragmented media landscape, creating cultural impact means driving captivating content…

How data stores and governance impact your AI initiatives

6 min read - Organizations with a firm grasp on how, where, and when to use artificial intelligence (AI) can take advantage of any number of AI-based capabilities such as: Content generation Task automation Code creation Large-scale classification Summarization of dense and/or complex documents Information extraction IT security optimization Be it healthcare, hospitality, finance, or manufacturing, the beneficial use cases of AI are virtually limitless in every industry. But the implementation of AI is only one piece of the puzzle. The tasks behind efficient,…

IBM and ESPN use AI models built with watsonx to transform fantasy football data into insight

4 min read - If you play fantasy football, you are no stranger to data-driven decision-making. Every week during football season, an estimated 60 million Americans pore over player statistics, point projections and trade proposals, looking for those elusive insights to guide their roster decisions and lead them to victory. But numbers only tell half the story. For the past seven years, ESPN has worked closely with IBM to help tell the whole tale. And this year, ESPN Fantasy Football is using AI models…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters