The resurgent relevance of the data warehouse

Post Comment
Big Data Evangelist, IBM

Every new big data platform vendor seems to want a piece of the data warehousing (DW) market. Consequently, you might say that the DW has never been cooler and more vital than it is now. And its role in the big data universe appears likely to grow.

The resurgent relevance of the data warehouseNo, a DW is not the sum total of your data management infrastructure, and it’s certainly not your only analytic database platform. What the DW does, above all else (and this is far from its only role in many organizations) is serve as a hub for governing your system-of-record data to be delivered into downstream decision support, business intelligence (BI) and analytic applications. It’s the core enterprise data platform for policy-based persistence and management of an organization's official "single version of the truth."

Usually, we all assume that the official records are structured data sets, hence that the DW must be built on a relational database, or on some columnar or dimensional variant of relational. But the notion of an all-structured "single version of the truth" still valid in the era of big data, where what we increasingly call the “logical DW” also pulls in data from semi-structured and unstructured sources as well. Wendy Lucas discusses this trend in her excellent recent blog that highlights IBM’s leader standing in the latest Gartner Magic Quadrant in DW and DBMS for Analytics.

As I myself stated in this recent InfoWorld column, the DW is not only alive and kicking, but it’s evolving in exciting new directions in the era of big data. “The fact that Hadoop…is starting to assume data warehousing infrastructure roles (refinery, archiving, exploration) doesn’t mean that relational databases, which have been the heart of this space from the start have grown less relevant. In fact, more IBM customers are moving toward a 'logical data warehouse' architecture in which relational platforms are increasingly supplemented, but not supplanted, by Hadoop platforms.”

Don’t pretend that your big data governance platform is any less of a DW simply because it runs in whole or in part on a non-relational platform such as Hadoop. And don’t imagine that changing the metaphor to “lake” or whatever makes it any less imperative that you have a core data-governance hub with all the scalability, cost-effectiveness and ease of use we’ve come to associated with a best-in-class DW platform such as IBM PureData System for Analytics (PDA).

If you have the right DW platform and an agile infrastructure, you can grow your logical DW as your needs evolve. PDA, for example, lets you flexibly grow your DW infrastructure up and out. PDA architecture incorporates such agile concepts as fit-to-purpose design, flexible licensing and deployment and self-service "build, load and go." In the larger evolutionary perspective, you will want to evolve this key investment into a hybridized infrastructure under which the DW’s core data-governance role is supplemented through other functional zones that incorporate Hadoop, in-memory columnar, stream computing, NoSQL and other platforms. Many of these other platforms will be cloud databases, such as IBM dashDB, that play seamlessly with your investments in appliances such as PDA.

Get more information on PDA and where it fits into your logical DW strategy, with specific emphasis on the latest new feature, IBM Fluid Query 1.0.