From data collection to data consumption

A shift in enterprise data strategies

Director and Distinguished Engineer, Offering Management, IBM

Not every startup is going to become a world-changing behemoth, but when a small, agile company hits on a truly disruptive idea, it can transform an entire industry. That’s a serious concern for market leaders, who fear that their dominant position could be eroded in just a few years if they fail to evolve at the pace of these nimble new competitors.

The rise of artificial intelligence (AI) offers a significant opportunity for the major players to fight back. To build and train the machine-learning models and deep neural networks that are already starting to transform businesses, an organization needs large quantities of accurate, domain-specific data. Over many years of operation, established businesses have typically built up a treasure trove of data that newer market entrants simply can’t match.

But there’s a catch. It’s not enough just to have the data. An organization must be able to trust it to use it. Many large businesses are still struggling with the challenge of giving their data scientists, business analysts and other knowledge workers access to the information they need at the time they need it.

How can harnessing data help businesses resist disruption? What are the roadblocks that prevent knowledge workers from turning data into business value? Why does the emphasis need to shift from data collection to data consumption?

Don’t be a hoarder

The focus of most enterprise-wide data initiatives has long been on collecting data. Over the years, the range of technologies available for data collection has widened from data warehouses and random collections of relational databases into document stores to data lakes, yet the dominant narrative has always remained the same: you mustn’t let valuable information slip through your fingers.

Although data collection is vital, it’s only the first step. If you focus only on creating efficient mechanisms for storing data, you are simply building a miser’s hoard. Data warehouses, master data management (MDM) systems and data lakes are all useful tools, but business value doesn’t come from putting data into a system. It comes when you take the data out again and put it to use.

See the full picture

What happens when organizations stop thinking about how data can be collected and start thinking about how it can be consumed? Instead of focusing on what the data needs, shift and focus on what the consumer needs. As a result, a few key principles emerge:

  1. All data should be discoverable. If a data set exists anywhere within an organization, users must be able to find it quickly and easily.
  2. Data must be well documented. It should be possible for users to understand at a glance what kind of information a data set contains so they can judge whether it will help them solve their business problem.
  3. Data must be obtainable. Regardless of where the data lives, it should be possible for a user to get hold of it immediately when they need to use it.
  4. Governance is critical. It’s important for users to know which data sets they can use and prevent them from accessing unauthorized or sensitive information.
  5. Data must be able to evolve. The role of data scientists and analysts is to use data to produce new assets, and those new assets also need to be captured, documented, governed and made findable and accessible to others.

These are the principles that are driving IBM to reimagine the way enterprises manage and use their data. To complement our long market leadership in metadata management with products such as IBM InfoSphere Information Governance Catalog in the data collection space, we’ve built IBM Watson Knowledge Catalog to handle the consumption side of the equation. Learn more about the IBM data catalog strategy.

Watson Knowledge Catalog provides a single point of access for all data and knowledge assets, from the enterprise systems managed by Information Governance Catalog to the smaller departmental systems, spreadsheets and other assets that have historically fallen outside centralized governance. It provides an intuitive user interface that helps business users not only navigate the company’s data assets, but also contribute comments, feedback and new data sets of their own.

From another perspective, Watson Knowledge Catalog is the final puzzle piece that completes the picture of enterprise data management. It was named as a leader in the recent Forrester Wave for Machine Learning Data Catalogs. Its native integration with IBM Watson Studio finally connects the world of data science with the world of traditional business analysis. It brings the machine-learning models and training data sets of the data scientist into the same environment as the data warehouse and data lake. As a result, it encourages the kind of cross-pollination and exchange of ideas that will help businesses develop AIs that truly contribute to their objectives.

Learn more about IBM Watson Knowledge Catalog or try IBM Watson Knowledge Catalog for free.