InfoSphere Streams and the IBM Netezza Data Warehouse Appliance
I’ve spent a fair bit of time recently learning about InfoSphere Streams and BigInsights, as I’ve been working on the Smart Consolidation story. And I wrote, in my last post, about how I had been presenting it to European tech analysts. Specifically I talked about how there are some classes of use cases that benefit from combining BigInsights and IBM Netezza. And of course there are use cases that favor one or the other.
Smart Consolidation is about having the all the right technology to match your analytic requirements and also having them integrated. It’s about workload-optimized systems across the analytic spectrum: structured and unstructured data. But it’s more than that. It’s a recognition that although the EDW ideal has been harder to achieve than anticipated by its proponents (for many years), we still need an integrated view - even if it isn’t all in one box - of our analytic data, for MDM, governance, consistency and other reasons. Gartner and some other analysts have been calling this the logical data warehouse and it seems to be a concept with resonance.
This week I’m at IOD and I’ll be talking and learning more about Smart Consolidation. And one of the topics I’m most interested in exploring is the set of use cases that integrate Streams with IBM Netezza. Last post i was going on about making Sloe Truffles from the leftovers of Sloe Gin as an analogy for extracting insight from data twice, first by using BigInsights and then using IBM Netezza. (maybe I should really beat that analogy to death this time by making my truffles from Patxaran).
But as well as use cases that can deliver a second level of insight by combining unstructured and structured data there is an additional scenario where Streams and IBM Netezza can play together, and that’s the model refinement scenario. Whenever Streams is used to score each record, it uses an algorithm, sometimes the algorithms might be definitive (e.g. taking a pre-mature baby’s temperature with a sensor) but some case (e.g. micro-segmenting a web-visitor by their clicks) the algorithm might be discovered by data mining recorded historic data. And IBM Netezza’s pre-eminent in-database analytics capability makes it the ideal partner for developing the algorithms - because as well as having an individual score, each record incrementally potentially affects the optimum algorithm that is being discovered by mining the accumulated records.
These closed loop use cases are another example of why a BigData strategy has to encompass structured and unstructured data, from intra- and extra-enterprise sources and being subjected to different kinds of analytic workloads.
Next post will be feed back from my conversations about this here at IOD...