What is free Hadoop costing you?

Product Marketing Manager for Data Lake & Cloudera Partnership, IBM

By 2025, there will be 180 trillion gigabytes of data in the world, compared to only 10 trillion gigabytes in 2015.1

Of this, 90 percent will be unstructured, which is why many organizations are adopting open source data lake technologies such as Apache Hadoop to handle this expanding volume and variety of data. There are many ways to get started, from generic free distributions to integrated analytics solutions that build on Hadoop technology, but to fully understand the impact that a Hadoop solution can provide, one must look at value, not just cost.

Value vs. cost for Hadoop 

Generic Hadoop, despite being free, may not actually deliver the best value for the money.

This is true for two reasons. First, much of the cost of an analytics system comes from operations, not the upfront cost of the solution. For example, highly skilled and highly compensated data scientists “typically spend 79 percent of their time with cumbersome data preparation and cleansing tasks” needed to operate a generic Hadoop implementation. Second, a generic distribution may not be able to support valuable, complex analytics as easily, or at all. In other words, acquisition cost is only part of the calculation. Hidden expenses and cost savings must also be taken into account to assess value.

This belief is corroborated by Cabot Partners, who break out total value of ownership (TVO) into four categories: total cost of ownership, productivity improvements, revenue or profits, and risk mitigation. Together, these represent a more complete view of what a Hadoop solution has to offer.

With this view in mind, Cabot Partners listed some areas in which integrated solutions typically provide more value than generic Hadoop distributions such as easier deployments and integration, higher-value analytics, and enhanced data quality.

Total value: IBM + Hortonworks

The acknowledgement of these benefits led IBM and Hortonworks (IBM + HW) to team up to deliver a Hadoop-based analytics solution that offers value far beyond that of a generic implementation.

Together, they created a single data and application integration platform with a common interface and repository.2 When Cabot Partners compared this integrated solution with a generic Hadoop distribution, Cloudera, the IBM + HW solution was shown to deliver better value in a number of ways.

Foremost, the solution has a lower total cost of ownership than the generic distribution despite higher licensing costs. This is made possible through cost savings in many areas such as deployment, hardware and acquisition, as well as in recurring software, maintenance, and operational costs. For example, the IBM + HW solution reduces the data center footprint, producing cost savings on power, cooling and facilities.

Enhanced productivity also contributed to a higher overall value for the IBM + HW solution. On an organizational level, IBM can encourage better productivity through expertise and support, but improvements can be seen on an individual level as well. For data scientists in particular, the tools provided in the IBM + HW solution “automate and simplify data discovery, curation, and governance.” Business analysts, on the other hand, can take advantage of community and social features to collaborate with their peers.

Because of this heightened productivity, the IBM + HW solution is also able to drive increased revenue and profits by enabling better insights to be delivered more quickly. This leads to rapid innovation and better decision-making capabilities that can spur quicker time to market, improved pricing models and better customer service, among other benefits.

Perhaps most importantly, the IBM + HW solution can add value while mitigating risks. The risk of project failures and delays is reduced through improvements such as reusable components and a streamlined workflow to name a few. But the benefits go beyond the projects themselves. Better governance through data cleansing and process consistency can help decrease risk of regulatory non-compliance.

Small organizations benefit from 56% better ROI, while larger organizations see 72% better ROI

Overall, the IBM + HW solution has a considerable value advantage over the generic Hadoop distribution Cloudera provides. When taking the factors listed above into consideration, Cabot Partners concluded that the ROI for the IBM + HW solution was higher and the gap widened as the organization’s size increased. Small organizations benefit from 56 percent better return on investment (ROI), while larger organizations see 72 percent better ROI, leading Cabot Partners to conclude that “Despite the larger software license cost over a Cloudera solution, clients deploying analytics workflows should seriously consider the IBM +HW solution.”

When it comes to enterprise analytics on Hadoop, “free” might not always be the best value. To learn more about the four components that comprise total value of ownership and the projected cost savings which led to Cabot Partner’s conclusion, read the full report. Inside you will find detailed cost comparisons between IBM+HW and Cloudera in each TVO category, along with information on expected payback periods.

Got Question? Ask our Experts!

Schedule a free one-on-one consultation with our experienced data professionals and distinguished engineers who have helped thousands of clients build winning data management strategies.

Ask our IT Pros

1 "IoT Mid-Year Update From IDC And Other Research Firms," Gil Press, Forbes, August 5, 2016.

2 The solution integrates IBM Db2 Big SQL, IBM Data Science Experience, IBM Unified Governance and Integration and IBM Spectrum Scale with the Hortonworks Data Platform (HDP).