For a while now at IBM Netezza we’ve been observing that the grail of a unified Enterprise Data Warehouse (EDW) is often not achieved. It still remains an ideal - all your data, cleansed, de-duplicated, governed, in one single place - but in reality not all organizations that attempt it, have achieved it and plenty have abandoned it as a goal - preferring a core EDW with a proliferation of application-specific data marts surrounding it. This is nothing new and IBM Netezza has made a good many sales, and repeated sales into exactly this kind of architecture. It’s an ideal use case for IBM Netezza to build such data marts, because no matter what the scale of data (and many of the marts are spun out precisely because they embrace new, potentially huge, data sources, such as Telco network data, retail POS data, Web page/ad impressions) or the complexity of the analytics, the IBM Netezza box is going to be up for it.
But the down side of this capacity for agile response to new analytic requirements and opportunities can be the recreation of data silos that the EDW was intended to eliminate. Blast - we’ve come full circle. And it gets worse - If you add in unstructured data (sensor data, text in blogs, xml weblogs etc - I like the alternative term poly-structured1) it rapidly becomes a major challenge to manage what started out as a strategy for simplification.
That’s why over the last few months IBM has been developing a response to this challenge - Smart Consolidation - that formalizes the relationship between the different sources and structures of data and the different analytic workloads. For us Netezza folk involved in the effort it’s taken us out of our comfort zone of structured (relational) data, but there is no way that an analytic strategy for the next decade can consist of just more and more relational databases (contrary to the simplistic recent view of one of our competitors; they finally saw the light - hallelujah: “there is more rejoicing...”).
In the last month I’ve had a couple of opportunities to hear analysts’ views on Big Data and to preview the Smart Consolidation strategy and architecture with them.
The relationship between unstructured data and structured data analytics has proved one of the most interesting pieces of the picture. There’s been lots of talk about how NoSQL is coming to eat SQL’s lunch, but it’s not as simple as that (SQL’s lunch has been up for grabs before; ask the OODBMS & XMLDB folks); the key is the use cases, what do you want to do with the data and where is the data coming to rest? If you want to analyze unstructured data it will have to come to rest somewhere in a file system before it gets into a relational database. That’s if it ever does get into a relational database. So if all the data you need to perform your analysis is in that one source and if it’s a use case that lends itself to map-reduce (for example sentiment analysis - ‘what do people think about our products/brands’), why would you parse it out (which you might well to use M-R to do) and load it into an RDBMS? This is the sweet spot for IBM’s BigInsights.
But if your analysis requires joining to other corporate data (for example tying sentiment analysis to individual customers - ‘what effect does published sentiment have on buying behavior’), then you will need to parse out the unstructured data and join it with data warehouse data - probably by loading the sentiment data into the warehouse, because joining data from different records (literally a ‘join’ in relational terms) is not an M/R sweet spot.
So there is a set of use cases where you can get insight from unstructured data and further insight by combining it with structured data. That’s what I called ‘making sloe truffles’ in my discussion with the european analysts at a briefing we did last month. Basically you take your sloes and gin and you make sloe gin. And I find Sloe Gin to be a very fine source of ‘insight’. Most people then throw the sloes away, but if you take the sloes, marinated in gin for months and make them into truffles - wow. Further insight! (beginners start here, Sloes are the fruit of Blackthorn trees - the antecedent of cultivated plums). Sloe Gin good, Sloe Gin and truffles - even better.
I’ll be at #IOD11 later this week, on the Smart Consolidation booth (and presenting the IBM Netezza 101 session - not ready to relinquish my relational roots yet); if you’re there, stop by.
 thanks to Mike Ferguson: http://www.intelligentbusiness.biz/