Here are the quick-hit ponderings that I posted on the IBM Netezza Facebook page this past week. I started the week in a sentimental mood, then developed my 2020 vision, then tried my best to cram it all in memory, then into the palm of my hand, and then finally crammed far more recommendation engine down my mental maw than a mortal human should be expected to chew on:
Social sentiment as valuable market intelligence?
One of things we tend to forget about social media is that people like me are in the minority, and even people like me skew what we say in this medium. Yeah, I tend to continuously tweet on seemingly everything in my environment, including the environment in my head, but, clearly, I have my personal passions and obsessions. If you roll up my tweets (and I digest it for you every weekend) and train your best sentiment analysis tools on them, you still probably can't guess what brand of toothpaste I'm using or what specifically I choose to dress myself in each morning. I skew my tweets to the stuff that, for me at least, falls into the "isn't this cool?" category.
I try not to bore myself with the mundane stuff that, all things considered, consumes the lion's share of my household budget.
Smartphones as Big Data analytics platforms?
I'm still waiting for somebody (anybody!) to comment on my prediction, from last year's Strata Summit, that this will be mainstream in 2020.
I'm nearsighted as all get-out, but I pride myself on my 2020 vision.
All in memory?
As the current mania for cramming more data into RAM intensifies, we have to remind ourselves that "speed of thought" is not the only criterion for deciding what information to persist on what platform. As with any resource, enterprise IT should fit various storage technologies - memory, disk, tape - to specific requirements. For example, the purpose of archive is to hold a huge amount of historical data in the cheapest storage platform, with real-time retrieval a low priority. Likewise, the purpose of many transactional stores is to move structured data cost-effectively at the speed of the average business process, not at the lightning-fast brain-velocity of a data scientist.
The core storage-optimization criterion is fit-for-purpose. All-in-memory is not the be-all end-all data-persistence approach for all information management purposes. And it won't be till prices drop by an order of magnitude.
Petabytes in the palm of your hand?
At Strata Summit last fall, I prophesized that a petabyte would drop to as low as $4 by the end of this decade, if current trends in the storage market continue. I haven't heard any significant pushback to that provocative point of view. That actually doesn't make me happy. It makes me nervous. Did people actually pay attention?
So I'll re-present my argument here. First, the venerable Vint Cert forecast that a petabyte of raw storage would drop by an order of magnitude - as much as 100:1 - by 2020 (source: Vint Cerf, Future Imperfect, IEEE Computer Society, Jan-Feb 2010), down from $80,000 per petabyte in 2010 ($120 per 1.5TB HDD retail). Second, Forrester Research reports that data deduplication rates of 20:1 are typical on structured data in production and archive (source: Forrester, Balaouras, “Use Deduplication to Store More with Less, July 10, 2009). Third, data compression efficiencies of 10:1 are becoming more common in the data warehousing market (source: Forrester, Kobielus, “The Forrester Wave™: Enterprise Data Warehousing Platforms, Q1 2011,” February 10, 2011).
Do the math: that's $4 per petabyte (of usable data storage) by 2020. That's dirt-cheap petabytes in the palm of your hand and embedded in every device. That's at 100x the capacity of your current data warehouse on your smartphone, tablet, or any other mass-market business or consumer device. Am I missing something?
This is one of the slipperiest, least well-bounded, most expansive categories of information technology. The recommendation engine is the heart of next best action - aka decision automation - but it's not a particular category of technology. A recommendation engine synthesizes disparate techs that drive offers, interactions, orchestrations, and other real-time responses to changing events and circumstances. Depending on the application, its embedded recommendation engine may integrate data warehousing, predictive analytics, business rules, complex event processing, natural language processing, social graph analysis, and online transaction processing. At the very least.
Considering the zillion ways that one might configure them and the myriad applications in which one might deploy them, how does one package recommendation engines as repeatable solutions? Does it make sense to think of them as general-purpose infrastructure to support every conceivable customer-facing and back-office application?
At the end of the week, I'm sensing that customers are overwhelmed by the pace of innovation in all these areas. Big data analytics is evolving on so many parallel but overlapping fronts. We all need a 3-day weekend to let it chill.