5 things you need to look for in an enterprise data catalog
Shopping for a data catalog can resemble shopping for a car. You know that all cars will get you from point A to point B, but you still must decide how you want to get there. Do you want a practical truck that will carry all of your possessions? Do you want a small sports car that will zip you there? Do you want a larger car that will fit your family and dog in the same trip? And what will you want five years from now?
Like car shopping, when looking for a data catalog, you’ll need to ask yourself what is most important. Unlike car shopping, you can’t sign a year-to-year lease and easily migrate to a new data catalog on a whim. So you must make a decision with the future in mind.
The abundance of varieties of data catalogs make it difficult to zero-in on the exact products that will meet your needs and have the potential to grow as your data governance and analytics initiatives grow. At IBM, we deem these five data catalog capabilities as critical to deliver business-ready data to your data citizens.
1. Search experience that enables your data citizens to shop for and consume data
Just like a car gets you from point A to point B, a functioning data catalog organizes your data so you can quickly find and consume it. Therefore, the most critical component of a data catalog is the ability to provide self-service access to data to your data citizens.
When looking at data catalogs, ask for a demo of the search experience and how it enables data citizens to shop for data. You will want something effectively as easy as the Netflix search experience. One where you can not only search based on categories and folders with a search bar, but also one where you see intelligent recommendations based on what’s most important to you and your organization. You’ll want to see data that was highly-rated by your peers and know which data sets you might want to avoid or cleanse before using for your data science initiatives. Once you find the data, you'll want to see how to consume it for building reports, analytics projects and models.
2. Automated compliance that helps you protect your data
Data catalogs not only organize your data, they also help you comply with regulations and company policies. Masking and restricting data takes time. With a quality data catalog, data published to the catalog will automatically comply with changing regulations.
When looking at data catalogs, ask for a demo of technical metadata generation and how to build and enforce rules and policies. A machine learning powered data catalog will automatically profile data assets published to a catalog, determine how to classify each column, automatically enrich the metadata with business terminology and then enforce pre-written rules for masking or restricting access depending on how the data is classified. This automates steps that would normally be extremely arduous and manual.
3. Connections that help you connect with data spread across disparate sources
Data catalogs are only as powerful as the connections they offer. If you want a true enterprise data catalog, you’ll want to find one with connections to all of your data sources whether it stores structured, semi-structured or unstructured data. Otherwise, you will not actually see all of the data kept by your organization, and you will continue to have silos and gaps.
You should ask for a list of connections available with the data catalog as well as plans for additions in the future. You’ll want to confirm that the provider is continually building out their ecosystem of data sources, so it grows as your data sources grow. Also look at deployment options for the data catalog to make sure you can deploy your catalog where your data resides—whether you're in a public, private, hybrid, or multicloud environment.
4. Quality and governance that helps your data governance teams
Analytics reports and data science models reflect the effort you put into data quality and data governance. If you cannot trust the data you use for analysis, you cannot trust your reports or data science models. If you do not have data quality or data governance programs in place, that’s the first place that you need to start.
It’s important that the tools you use for those programs integrate seamlessly with your data catalog, or your efforts to deliver trusted, business-ready data will be unsuccessful. A data catalog that integrates with data quality and governance tools like data quality rules, business glossary and workflow means a seamless platform that will grow with you as you create and deliver trusted data. Ask to see a demo of how a data catalog will support your data governance and data quality needs, as well as enhancing the output of these initiatives, so your data citizens and data scientists know that they’re not creating reports and models with bad data.
5. Governance for AI
Odds are that your company either has a data science and AI team or plans to create one very soon. AI is the next big disruptor. According to Gartner, by 2022, every personalized interaction between users and applications or devices will be adaptive. That means that companies will use AI to build a customer-centric user experience. If you do not keep your AI initiatives in mind when shopping for a data catalog, then you’ll find yourself shopping for a new one in the next couple of years. Data governance teams will soon be responsible for managing AI models—understanding the data used, explaining its results, governing usage, and regulating bias. Ask for a demo of not only how a data scientist can find data in the data catalog, prep it for AI, then start building models with it, but also how the catalog can help the enterprise governance program grow to support the maturing demands of AI governance.
With these five capabilities in mind, you’ll find narrowing down your search much easier.
Learn more about the IBM enterprise data catalog, Watson Knowledge Catalog, and read about how IBM leads in Forrester’s report on machine learning data catalogs.