The Real Scoop on Enterprise Hadoop

Big Data Product Marketing Manager, IBM

It’s hard to have a conversation about big data without talking about Hadoop. Sure, it can be done. You can discuss how big data is all data, how big data without analytics is just “same ol’ data”, or how the implications of governing big data are even more severe than in a traditional environment.  Those would all be fine conversations. But if you’re having a conversation about how to get started with Hadoop, you’d be hard-pressed to ignite the discussion without talking about the yellow elephant in the room. 

So why is Hadoop such a natural starting point?  Well first, let’s make sure we understand what it is.  It’s an open source software project that enables distributed processing of large data sets across clusters of commodity servers.  Hadoop changes the economics and dynamics of large scale computing by enabling a solution that is:

  • Scalable– New nodes can be added as needed and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
  • Cost effective– Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
  • Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.
  • Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

In short, Hadoop allows organizations to analyze massive volumes and increasing varieties of data. When you consider that 80% of the world’s data today is unstructured, the variety aspect becomes very important. Before Hadoop, many organizations had to discard certain types of data because they lacked the ability to process it. The ability to analyze new data types leads to more informed, faster decision-making.

If Hadoop is so easy, then why isn’t everyone using it?

Who wouldn’t want the ability to tap into more information, to be able to analyze machine, social and traditional data simultaneously? There are clear advantages that Hadoop provides and yet, organizations are lucky if they can find one Hadoop expert, which is not a scalable staffing model for this scalable software. Open source Hadoop is very complex and difficult to master.  Many businesses don’t have the luxury to be able to invest in the skills required to work with Hadoop.  This is where IBM comes into play.

InfoSphere BigInsights for Hadoop – Bringing Hadoop to the enterprise

IBM recognizes the value in open source Hadoop, but also understands that by itself, open source Hadoop is incomplete and complex.  We’ve built our Hadoop distribution on top of the open source components of Hadoop, adding value added features like built-in analytics, visualization and security, all designed to take the complexity out of Hadoop.  InfoSphere BigInsights for Hadoop is the most comprehensive and reliable Hadoop solution available today.

You may be thinking, “That’s all well and good, but why would I buy something that I can get for free?” A fair question, and one with a two-part answer.  First, I would argue that working with open source is not free. In a recent ITG analyst report titled “Business Case for Enterprise Big Data Deployments: Comparing Costs, Benefits and Risks for Use of IBM InfoSphere BigInsights and Open Source Apache Hadoop,” ITG performed an extensive cost analysis and found that the three-year cost of using Hadoop outweighed the costs of InfoSphere BigInsights for Hadoop.

Three-year Costs for Use of IBM InfoSphere BigInsights and Open Source Apache Hadoop for Major Applications – Averages for All Installations

Second, InfoSphere BigInsights for Hadoop offers far more capabilities than open source Hadoop.  It preserves the open source components and builds on top of them to add features that increase performance, enhance analytics and make Hadoop easier to use.  Not only that, but it gets better performance, too.  In an audited benchmark conducted by STAC®, the Securities Technology Analysis Center, InfoSphere® BigInsightsTM for Hadoop was found to deliver an approximate 4x performance gain on average over open source Hadoop running jobs derived from production workload traces.


It’s your turn to get your hands on enterprise-ready Hadoop for FREE

InfoSphere BigInsights for Hadoop was first introduced in 2011, in a Basic and Enterprise Edition.  Enterprise Edition has the value-added features shown above, while the Basic Edition was a free download of Apache Hadoop packaged with a web management console.  In June 2013, IBM introduced the InfoSphere BigInsights Quick Start Edition.  This new edition allowed anyone to be able to experience the majority of the features within the Enterprise Edition at no charge. 

But Quick Start is more than just a new product edition. It’s an experience. Part of making Hadoop easier is building new features into the product. The other part is providing education to the growing group of new users.  Are you a database administrator? Then you’ll want to take our Big SQL tutorial and learn how you can use a a familiar programming language in a Hadoop environment.  Are you a business analyst? There’s a tutorial for you on how to use BigSheets, an intuitive spreadsheet environment to be able to see and work with all kinds of data. Are you simply new to Hadoop and looking to learn?  Our videos, guided tutorials and resources are designed for you. 

Regardless of your experience level, with Quick Start, you can begin experimenting with Hadoop today.  You get access to hands-on learning through a set of tutorials designed to guide you through your Hadoop experience. Plus, there is no data capacity or time limitation, so you can work with large data sets and explore different use cases, on your own timeframe.

InfoSphere BigInsights Quick Start Edition does not come with a support option. To explore your support options, you’ll need InfoSphere BigInsights for Hadoop Enterprise Edition.

Let’s take a closer look at some of the key features in the Quick Start Edition.

InfoSphere BigInsights for Hadoop Quick Start features

Text analytics: Sophisticated text analytics unique to BigInsights with a vast library of extractors enabling actionable insights from large amounts of native textual data.  This incredibly powerful engine has the ability to parse text and detect meaning using 100s of built-in annotators. This comes in handy with sentiment analysis, consumer behavior and illegal or suspicious activities. 












BigSheets: Web-based analysis and visualization tool with a familiar, spreadsheet-like interface that enables analysis of large amounts of data and helps to design and manage long running data collection jobs. BigSheets was developed specifically for business intelligence and non-technical business users to facilitate data gathering and analysis. It’s able to work with structured and unstructured data, and it can combine data from different sources, allowing users to pinpoint hidden risks or opportunities in the data.


Big SQL: New, native SQL query engine that enables SQL access to data stored in BigInsights, leveraging MapReduce for complex data sets and direct access for smaller queries. Big SQL allows people that are experienced with the familiar SQL language to apply their skills in the Hadoop world. 













Workload Optimization: Adaptive MapReduce is an installation option for using IBM Platform Symphony technology in place of Apache technology for MapReduce. This optional installation adapts to user needs and system workloads automatically to improve performance and simplify job tuning while workload scheduler provides optimization and control of job scheduling based on user-selected metrics.

Development Tools: Familiar, Eclipse based development environment for building and deploying analytic applications and a set of developer tools extractors and editors for fast adoption and reduced coding and debugging. 

Management Capabilities: Auditing helps tighten security and access control while monitoring provides the ability to control all applications from a centralized dashboard.

Download Quick Start today

InfoSphere BigInsights for Hadoop Quick Start is for all levels of Hadoop expertise.  Whether you’re a newbie, have thought about using Hadoop or are already using it on several big data projects, you can download Quick Start and immediately get to work.  If you’re looking for a little more education and guidance, don’t worry, it’s there for you.

When you are ready to download, you’ll find two different options: native software installation and VMware image.  If you choose the native software option, you will download Quick Start directly onto your machine. If you choose the VMware image, you’ll be presented with a single node or cluster option. The single node VMware image is for smaller data sets, while the cluster version allows you add multiple nodes for larger volumes of data. Regardless of which option you choose, the features and capabilities are the same.

Download your free version of InfoSphere BigInsights for Hadoop Quick Start today, and don’t be shy about giving your feedback on the DeveloperWorks forum.