Who's afraid of the big (data) bad wolf?
Big data is a fearsomely valuable business resource. But its sheer scale, complexity and power strike fear into some people's hearts, especially among IT professionals who have to find the budgets, personnel, platforms and other ingredients for big data success.
Big data's value is amply attested to by many real-world implementations in many industries. But people vary widely in their perspective on whether the bigness alone is what makes the difference. Nevertheless, many assume that it does, which translates into a prevailing "bigger is better" outlook.
There's no one consensus perspective on why bigger can indeed be better (if not always, then in many scenarios) in the realm of data analytics. One might characterize the different schools of thought thusly:
- Faith: This is the unswerving belief that somehow greater volumes, velocities and varieties will always deliver fresher insights. If we're unable to find those insights, according to this perspective, it's only because we're not trying hard enough, we're not smart enough or we're not using the right tools and approaches.
- Fetish: This is the unnerving notion that the sheer bigness of data is a value in its own right, regardless of whether we're deriving any specific insights from it. If we're evaluating the utility of big data solely on specific business applications it supports, according to this outlook, we're not in tune with the modern need of data scientists to hoard data indiscriminately in "data lakes" in support of future explorations.
- Burden: This is the begrudging notion that the bigness of data is a necessary evil: a fact of life with the unfortunate consequence of straining the data integration, storage and processing capacity of existing databases, thereby necessitating new platforms, such as Hadoop. If we're not able to keep up with all this burdensome new data (or so this perspective leads us to believe) the core business imperative is to change over to a new type of database.
- Opportunity: This is, in my opinion, the right approach to big data. It's focused on extracting unprecedented insights more effectively and efficiently as the data scales to new heights, streams in faster and originates in an ever-growing range of sources and formats. It doesn't treat big data as a faith or fetish, because it acknowledges that many differentiated insights can continue to be discovered at lower scales. And it doesn't treat data's scale as a burden, but as simply a challenge to be addressed effectively through new database platforms, tooling and practices. Last year, I blogged on the hardcore use cases for big data, in a discussion that was exclusively on the "opportunity" side of the equation. Later in the year, I observed that big data's core "bigness" value derives from the ability of incremental content to reveal incremental context.
However, it's important to remind ourselves that bigger isn't always better, even if you take an opportunity-centric perspective. That's because some types of data and some use cases are more conducive than others to realizing fresh insights at scale. For example, many real-world business intelligence and performance management applications have proven their value at "small data" scales.
And there is validity to the "burden" perspective, a necessary counterbalance to any focus on big data opportunities. Many Hadoop, NoSQL and other data-centric initiatives involve collecting, moving, transforming, cleansing, integrating, exploring and analyzing large volumes of information from disparate sources. The success of big data projects often depends on having access to robust, scalable data integration. You would be hopelessly naive if you didn't acknowledge the fact that integrating huge amounts of data into "data lakes" can be quite burdensome, costly, complex, time-consuming, labor-intensive and so on.
The massive volumes and lightning velocity of big data are challenges for scaling, performance and resource provisioning. The dizzying varieties of big data (structured relational tables, unstructured text and all shades between) are another matter altogether. Correlating and extracting useful intelligence from this mess demands a keen focus on rich metadata. Big data integration will run afoul of the "Tower of Babel" syndrome if we don't rely to the maximum extent on common industry interfaces and use field-proven approaches.
Rest assured that big data integration doesn't need to be burdensome, especially if you're wielding the right platforms, tools, personnel and best practices. If you're suitably empowered, there's no need to fear the big data wolf at the door.
It's with that in mind that I urge to you register to attend this webinar scheduled for Thursday, September 11 at 3 p.m. ET. Guest speakers include R "Ray" Wang, principal analyst and founder of Constellation Research, Inc., and Tony Curcio, IBM InfoSphere Information Server Product Management. They will present the current state of the big data market and discuss customers who have confronted the big data integration wolf successfully. A key takeaway from this webinar: Ray and Tony will outline five best practices for big data integration that crystallize the lessons to be learned from those case studies.