What’s the scoop on Hadoop?

Program Director, Marketing, IBM Analytics

“Hadoop is unstoppable as its open source roots grow wildly and deeply into enterprise data management architectures,” Forrester analysts Mike Gualtieri and Noel Yuhanna wrote recently in The Forrester Wave Big Data Hadoop Solutions Q1 2014 on the Hadoop marketplace. “Forrester believes that Hadoop is a must-have data platform for large enterprises, forming the cornerstone of any flexible future data management platform. If you have lots of structured, unstructured and/or binary data, there is a sweet spot for Hadoop in your organization.”

Industry analysts and organizations alike are catching on to the powers of Hadoop. But, to better understand, it is important that we ask a few key questions first: What is Hadoop? What value does it offer? Why should it be part of your information supply chain?

Hadoop is open source software that enables distributed processing of large data sets across clusters of commodity servers. There are basically two things to know: how Hadoop stores files and how it processes data. One mind-boggling feature of Hadoop is that it lets you store huge files, files potentially greater than the capacity of a PC. And it lets you store many, many of those huge files, over and over. Hadoop removes constraints that traditional IT environments felt when big data came onto the scene.

The next step is to understand how Hadoop processes and moves data around. The framework for moving data is called MapReduce.  The key difference between processing data in Hadoop and processing via transitional methods is that MapReduce brings the processing software to the data instead of moving data over networks. Moving such large files as Hadoop has capacity to store across networks would be slow, and could even evolve risk.

The scalability that Hadoop offers in both storing and managing data, at a fraction of the cost of traditional data management solutions, provides huge cost benefits for IT organizations. Hadoop isn’t looking to replace existing infrastructures, but instead modernizes architectures to capitalize on big data—all the new sources and types of information available today. But I believe the real scoop on Hadoop is what you do with all that data: the analytics you can run, the business decisions you make and the actions you can take. 


Join me next time to discuss analytics on Hadoop and more innovative features that enable real business benefits.

What’s your scoop on Hadoop? Leave me a comment to discuss, and be sure to check out the Big Data for Social Good Challenge, a global hackathon where developers compete to create innovative solutions using Hadoop that solve civil and other real world social challenges. It’s open, fun and there are big prizes too.

Participate in the #Hadoop4good challenge and change the world