More Big Data Mythbusting

Big Data Evangelist, IBM

Jamie, Adam, and the crew have nothing on me. Here are verbatim thoughts in the inline response I posted on another "Big Data Myths" blog, this one by Jen Cohen.

"Very thought-provoking, Jen. There are plenty of big data myths floating around. We simply need to deflate them with facts. Let me add a few additional thoughts re each myth you discussed:

Myth #1: Big data is really “BIG.” I like to think of big data as, at heart, "massively scalable analytics." In other words, it's more about having headroom to scale your analytics into the petabytes, into real-time streaming, and into multi-structured data territories. I also like to think of the "scalable" part of it in terms of thresholds beyond the usual: beyond low-terabyte volumes, beyond batch ingest and delivery, and beyond unistructured relational data varieties. It's not about "big" in any absolute way, but "how scalable does your analytics platform need to be now and over the next several years?" What companies of all sizes are starting to realize is that they'll need to scale to one or more of the "Vs" sooner than they realize. Are you doing the architectural planning, and do you have the right platform(s), to provision more storage, memory, processing, and bandwidth rapidly and cost-effectively when Vs start to bear down on you?

Myth #2: Big data makes BETTER analytics.  No, of course not. But, as I've said elsewhere, big data enables the new paradigm of “whole-population analytics." This involves having the entire population of analytic data to drill into, rather than just the traditional capacity-constrained samples/subsets. Being able to capture, aggregate, mine, model, manipulate, search, query, and visualize the entire population of any data set can give you fresh insights. For example, having a 360-degree deep-historical customer view, including rich real-time behavioral data, enables you to do more powerful micro-segmentation, fine-grained target marketing, nuanced customer experience optimization, and agile next best action.

Myth #3: You need a team of Hadoop engineers and analytics platforms to be on premise to work with big data. On-premises? That's not always necessary or prudent. One of the exciting things about the big data revolution is the growing range of outsourced, hosted, and multitenant cloud/SaaS offerings. Likewise, a growing range of consulting and professional services are helping users to bootstrap their internal competencies. You don't need to do it all in house. You can bring in the best and brightest data scientists to help on mission-critical big data projects that involve Hadoop, NoSQL, MPP EDW, graph databases, and other platforms."