The Marriage of Hadoop and the Data Warehouse

A match made in heaven

Director of Offering Management, IBM Analytics, IBM

Big data: everybody is talking about it. The buzz generated around this topic almost eclipses the buzz around traditional data warehousing. Some big data enthusiasts have even speculated that all enterprise data will be hosted by an Apache Hadoop–based system in the near future and that the enterprise data warehousing (EDW) will be dead.

Well, there is no doubt that traditional data warehouse architecture is evolving. I have been writing and blogging about that for over a year now—but dead? Hardly. In fact, while everyone is talking about how one technology or architecture may win out over the other, IBM is having a different conversation.

At IBM, we prefer to talk about the marriage of Hadoop and the data warehouse because together, they really make the perfect couple. Think about it—the opportunity of big data for a traditional data warehouse shop is to consume data that they could not consume using traditional warehousing architectures.

But why aren’t traditional data warehouses up to the task? Well, for several reasons. First, the data warehouse has been traditionally architected to use structured data from our business systems to analyze things about our business. This data is cleansed, modeled, distributed, governed, and maintained for historical analysis. The data we store in the data warehouse is predictable both in structure as well as ingest rates.

In contrast, big data is unpredictable. It comes in many structures and it’s just too much volume for the EDW, especially since we are most likely to sift through lots of data to find what we really need. Then we may just decide to discard it because in some cases, the shelf life of this data is significantly shorter. If we decide to keep all that data, we need cheaper solutions than the EDW to store the unstructured data for historical analysis (which is yet another argument for using Hadoop in addition to the warehouse).

Big data is an opportunity for many customers, and Hadoop now offers us the ability to consume new sources of data that make our analytics even smarter. But this new frontier is a complement to traditional data warehouse architectures, not a replacement. We are still going to supply traditional analysis to all of the business areas (finance, marketing, sales, customer service, and so on)—none of that analysis will be going away anytime soon. But let's face it: we should be expanding our analytics menu to include new sources that offer additional insight and new tools that allow us to do things we couldn’t in the past, such as sentiment analysis.

I believe that big data was one of the key motivators in the evolution of the EDW architecture—but it wasn’t the only one. The continued growth of appliances, the high demand for time to value, the need for agility, and even simplicity in our solutions also played large roles.

Think about it: agility and simplicity? Those were not words we used very often as we built our enterprise data warehouses! However, the facts are pretty simple. Many large EDW projects were never able to achieve their full potential because they became too complex and therefore far less agile than the business had hoped for. It’s also a fact that companies that do use analytics to drive decisions are better performers. These companies show a 49 percent improvement in compound annual growth rate (CAGR), they do 20 times better on profit growth, and they show a 30 percent uptick in return on investment. No wonder most companies are in a hurry to implement.

The Marriage of Hadoop and the Data Warehouse – figure

The secret to building this harmonious relationship is to really understand the type of analytics you have today, as well as what you’ll need in the future. The picture we once drew of the EDW now looks more like a thriving ecosystem. We have gone from using an architecture that focuses on serving up enterprise data to using an architecture that serves up enterprise data and smarter analytics.

Think about all types of data with all types of analytics. Now that’s smarter analytics!

We have made great progress. Let’s keep it going.

[followbutton username='nancykoppdw' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']