What is a data catalog and why you need one?
A data catalog organizes your information assets and empowers data citizens with business-ready data.
Here’s a common scenario: Alex is working on a data analytics project for her retail company to better understand success of shoe sales versus jewelry and predict future sales. Her company is split up by departments, so she has to go to the shoe line-of-business and the jewelry line-of-business for the data she needs for her analysis. She submits a form for each requesting data that would meet her business needs. She waits. And she waits. She meets with the team to clarify her request. She waits. And she waits.
Finally, they say she will have access to her data soon – the team is just masking the data for her. It ends up taking a few weeks to get her hands on the right data that she needs. Then it takes her another week to figure out what each column means and prepare the data for her project.
A few things that could have been sped up here: finding the right data, masking that data, then explaining the data.
A few things that could have been improved upon: validation that she has all the relevant, current data for her project and trust that the data is of high quality.
If Alex’s company used an enterprise data catalog, all of those pain points could potentially disappear.
What is a data catalog?
A data catalog organizes your company’s information assets so it’s easy for people like Alex to find what they’re looking for. Libraries use catalogs to help readers find all of the books available in each of their branches. Readers can search on genre, reviews, and popularity; learn more about the book they want to check out; read the librarian’s reviews of the book; and then find that book in one of the library’s branches.
A data catalog is similar. A data catalog lets data analysts find all the data available in each database or application maintained by their company. Business analysts can search on data type, reviews, and popularity; preview the data; see what others say about it; better understand its quality; and then download the data asset for their project and analyze it.
On top of that, data catalogs which are tightly integrated with a governance platform, help your business comply with changing regulations and policies and help provide your data citizens access to governed data. After classification of data assets, rules can be created that anonymize or restrict access to certain data, so data personally-identifying information does not end up in the wrong hands.
5 reasons to have an enterprise data catalog
Speed and self-service. Rather than submitting requests to an IT group for data that will meet analysts’ business needs, Analysts simply search through a data catalog themselves. This frees up more time for the IT group and means that the analyst wouldn’t need to wait for them to get back to him or her. It provides self-service access to data to data citizens.
Comprehensive search and access to relevant data. You don’t know what you don’t know. An analyst will not know if they’re missing relevant data or the most up to date asset unless they can search across all available data assets. They might find something they would not have been able to find before which can augment their analysis and provide better insights.
Meaningful context. When an analyst finds a data asset that would be useful to them, they can read a description, view business metadata and business term definitions, and read comments provided by others about the data. That way, the analyst can put each column in a data asset in the context of their business.
Improves trust and confidence in data. By previewing the data and profiling it, an analyst can very quickly see if certain fields have null or incorrect values. This makes cleansing the data even easier. The quality scores and social recommendations on the data asset help improve the confidence in data for an analyst to use.
Protects data while staying compliant. Instead of an IT professional masking each column, data rules automatically run based on automatic classification of data. So companies never have to worry about the wrong data getting into the wrong hands.
Why IBM Watson Knowledge Catalog?
It can sometimes feel like the wild West out there in the data catalog market. But remember, a standalone data catalog which cannot integrate tightly with your enterprise governance platform could potentially give bad quality data in the hands of your data citizens.
IBM Watson Knowledge Catalog, a machine learning powered data catalog, satisfies all of the key data catalog capabilities as well as provides seamless integration with IBM's data integration, quality and governance products and other IBM Watson technologies for analysts and data scientists to use their data in reports, analytics projects, and models.
With IBM Watson Knowledge Catalog, Alex would’ve found out that jewelry is way more profitable than shoes in the time it took to submit her request to the departments. She then would have had another month to predict buying trends in other lines-of-business. Her IT department would have had more time to finish their data projects since they would have been less distracted with data requests. Everybody wins with a data catalog.