Here are the quick-hit ponderings that I posted on the IBM Netezza Facebook page this past week. I went deeper on the themes of information glut, DW appliance POCs, handheld petabytes, and online recommendation engines. I started up a new thread on crowdsourcing in big data modeling, development, and exploration. I posted most of these while attending and presenting at the Hadoop Summit (more on that in next week's quick-hits etc.):
June 11: Information glut? Next-gen analytics tools should make mincemeat out of it.
Gluttony is a very forgivable sin. Some people simply have larger appetites than others, which may or may not be due to their higher energy level or faster metabolism. Of course, that's not to say that some people are just selfish, tending to take far more of the good stuff than they truly need or can digest.
Information glut is relative to your personal "bandwidth," a loose metaphor that we should stretch to include the accelerator effect that advanced analytics is having on everybody's information metabolism. Those of us whose births predate the dawn of the Internet digest far more information now than during our college days - joyfully, ravenously, continuously, and without strain.
Self-service business intelligence, advanced visualization, and in-database analytics are the great metabolizers in the information economy. Without these tools, we would all be swamped in a data tsunami that never subsides. But armed with them, we can navigate petabytes and beyond without breaking a sweat. Meaningful insight patterns just pop out at us, or, at the very least, suggest fruitful avenues of exploration.
Tolerance for "information glut" is generational, of course. People too young to remember the pre-Internet age rarely gripe about information glut. Why is that? Because they take the free-flowing information cornucopia for granted. And their world comes equipped with the tools to harness and harvest this largesse 24x7.
June 12: Proofs of concept as core appliance acquisition approach? Proving business value is a bit trickier.
In a proof of concept (POC), it's relatively straightforward to demonstrate the fast queries and loads for which the traditional data warehousing (DW) appliance was engineered. We can demonstrate it all out of the box with sample data and queries, or, after some prep, with your specific business data and queries.
But the DW appliances of today and tomorrow are far more than just fast analytic databases. In many ways, they are core infrastructure for the full range of business applications that depend on big data.
Think of the new generation of DW appliances as application servers and development platforms. The embedded libraries of statistical, predictive, natural language processing, MapReduce, R, and other models are the stuff of rich applications. How do we, as vendors, demonstrate to you the full business value of such appliance? Yes, we can bundle demonstration apps, plus data, with the boxes, but that wouldn't be your specific application. And, most likely, you wouldn't have a pre-existing application that we could accelerate without muss or fuss (or considerable rework) in the POC.
Rest assured that POCs are important in this new era of analytic solution appliances. A POC should demonstrate the value, vis-a-vis the traditional "roll your own" analytic app server, of a single turnkey installation with unified deployment, optimization, and management tooling. For your analytic application developers, the POC should demonstrate that the appliance has the bundled library and tooling to help them maximize their productivity.
June 13: Petabytes in the palm of your hand? The vast majority of what you cache will be your quantified life.
I've been playing with my new smartphone (an Android, by the way), and I'm already addicted to it. They definitely have that effect.
I keep thinking that big data will come to these devices in a big way by the end of this decade. As I prophesied in my previous quick-hit on this topic, the economics will improve by the end of this decade to the point that you will have far more storage, memory, and computing power on these devices than is on the average data warehouse (DW) now.
Of course, only the most foolish enterprise DW manager would allow users to put all that sensitive business data on devices that will get lost and stolen with nauseating regularity. I won't comment any further on the security and compliance ramifications of mobile access to DWs.
But it occurred to me that the smartphones, tablets, and other gadgets of the near future don't need DW data in order to fill up their solid-state storage. They will almost certainly be fed from an inexhaustible stream of personal data that you and I generate 24x7. The uber-geeks at the forefront of the "quantified life" movement are showing the way.
To the extent that we're using the gadgets to monitor our vital signs, that data will be stored locally and also backed up to the cloud. To the extent that we need to keep track of our coming and goings, much of our geospatial history will be there as well. Most of our social communications will also reside in local storage. The devices will all come with rich analytic apps to help us to continually mine, correlate, visualize, project, protect, and share this information.
And we may chose to provide some of this data to online merchants and others so that they can tailor offers, services, and experiences to our liking. Personal data is becoming a precious currency. Literally.
June 14: Recommendation engines? Federations of these will drive the interconnected global economy.
We often think of recommendation engines as embedded runtimes in next best action, digital marketing, e-commerce, ad-optimization, and other online business applications. In other words, recommendation engines are often siloed infrastructure that serve a very specific application. If you're a typical enterprise, you might have multiple recommendation engines for various applications, with each engine having its own predictive analytics, business rules, orchestrations, and other business logic.
That fragmentation is fine as long as your recommendation-engine-powered applications don't need to interoperate. But of course, you manage online supply chains that tie the front-end customer-facing applications with the back-office order fulfillment, manufacturing, logistics, and other applications. Though their runtimes should probably be loosely coupled from each other, your front-end and back-office recommendation engines should be working from a common pool of data, models, rules, and the like. That's federation, in its most basic sense: the ability of separate application domains to interoperate through agreed-upon standards, policies, and infrastructure.
Where business-to-business (B2B) e-commerce is concerned, federation of recommendation engines is absolutely essential. The front-end channel presenting a next-best-offer recommendation should not offer it unless the back-end outsourced value chain can fulfill it by triggering next-best-action instructions to factories, warehouses, shipping, and other participants.
Smarter commerce refers to any federated value chain, internal and/or B2B, where everybody is being guided, from end to end, to do the right thing at all times.
June 15: Crowdsourcing Big Data creativity?
I've never been 100-percent comfortable with this "wisdom of the crowd" paradigm, aka "crowdsourcing." All of us can point to mob rule, foolish manias, and murderous mayhem perpetrated by crowds that were possessed by the opposite of wisdom.
I prefer to think of this as "cluesourcing." Large groups, if nothing else, have a deeper pool of experiences, suggestions, and inklings to work from than each of their members in isolation. How can we discover, harvest, vet, and use these clues in order to distill useful intelligence from the groupthink? Never mind "wisdom" (that's setting the bar way too high, and implies a degree of humility and reflection that groups rarely possess).
Ideally, your big data initiatives should stimulate creative groupthink among your precious cadre of data scientists. Externally, you may also want to tap into the growing movement of "crowdsourced data science" communities, such as the one pioneered by Kaggle. What's fascinating about Kaggle is that the community is encouraging the world’s best and brightest data scientists to pool their expertise to solve pressing challenges facing businesses and humanity at large.
At the end of the week, I'm flying home from Hadoop Summit, rushing to get this ship-shape & go-go-inflighted back to my social media coordinator to post on the blog. Need a weekend. It's been a very exciting week. I learned a lot. I will share.
Follow IBM Netezza On:
Follow Jim On: