Understanding the Data Value Chain
Adopt a different view of data as a raw material for the data lifecycle business resource
Why is there so much excitement around new data technologies? The scale and flexibility of advanced big data tools are not the only reasons; the ability to use data as the raw material of business and to view it as an asset from which value can be created are the real attractions. The capability to get value from data is often limited by the worldview of organizations. Much of today’s thinking by IT originates from an application-centric view of the world, focused on functional units that fulfill well-defined business purposes. Unfortunately, this view leads to inflexibility, silos, and poor levels of data exploitation. While it may enable an existing business, traditional data architecture places organizations in a poor position to exploit future opportunity. For many companies, poor data use isn’t just an inefficiency; there are existential implications. The booming growth of data-driven organizations such as Amazon, Castlight Health, and Uber illustrates how vulnerable competitors are to a business strategy built on a scalable data fabric. Especially for an established organization, moving away from cost center thinking to a position from which organizations also perceive the value-creation opportunities around data is not easy to do. Despite the most earnest desires of IT vendor marketing folks and wishful businesspeople alike, businesses can’t simply “Apache Hadoop” the data and receive magical value. They should instead see data as a raw material and understand its lifecycle as a resource in the business—assessing its value, knowing how to grow that value, and accepting the associated costs and benefits of data processing. The often-used analogy between data and oil is a good one. In the same manner as crude oil, data can be used for a diversity of applications, all radically more valuable than the raw product. Likewise, to create value, data goes through many steps of processing and combination. Understanding these steps is crucial to devise a modern data architecture, which in turn gives organizations the capability to solve business problems with agility and scalability.
The data value chain
The key to understanding these steps is to take a focused look at each stage of the data lifecycle. Though the headlines are inevitably written about innovative data science, good data science cannot be accomplished without good data, and every step in this chain is vital. This first installment in a series of articles introduces the following stages, and subsequent articles take a deeper dive into each stage:
- Discover: In today’s digitized world, there are many sources of data that help solve business problems that are both internal and external to organizations. Data sources need to be located and evaluated for cost, coverage, and quality.
- Ingest: The ingest pipeline is fundamental to enabling the reliable operation of entire data platforms. There are diverse file formats and network connections to consider, as well as considerations around frequency and volume.
- Process: Many applications are well served by processing data immediately following the ingest stage, to transform the data into a format that facilitates its reuse or to take immediate action based on incoming events.
- Persist: Cost-effective distributed storage offers many options for persisting data. The choice of format or database technology is often influenced by the nature of other stages in the value chain, especially analysis.
- Integrate: Much of the value in big data can be found from combining a variety of data sources to find new insights. Integration is a nontrivial but valuable step in which this combination process occurs.
- Analyze: The star of the big data show—analysis—depends critically on every other step in the value chain—the so-called data janitorial work that makes up 80 percent of data science. New insights and actions are derived from data, enabled by an ever-growing and nuanced choice of tools and platforms.
- Expose: The results of analytics and data that are exposed to the organization in a way that makes them useful for value creation represents the final step in deriving value from data.
A new way of thinking
Creating value from data requires a new mindset. Silos are hard to escape, whether they are technical or conceptual. To holistically exploit the opportunity of big data tools and architectures, a new way of thinking is needed that frames data as a raw material of business. The answer is to focus not on the functional components—what organizations do to the data—but on business outcomes and how they can be achieved—what they do with the data. This novel approach can be cultivated through looking at the data value chain. From discovery and ingest through analysis and exposing results, this series takes a detailed look at these seven data value chain steps. Each step receives an overview of the spectrum of strategies, tools, and architectures that are available today. The resulting understanding enables data scientists and other analysis professionals to analyze for suitable areas to make investments that can create new value from data. Please share any thoughts or questions in the comments. [followbutton username='edd' count='false' lang='en' theme='light']
<table valign="top" width="15%>