Unlocking business value through enterprise Hadoop adoption
The sacred elephant
In ancient Hindu mythology, the sacred white elephant Airavata is significant in several ways. It guards the door to Swarga (a loose interpretation of what might be called “heaven”). It’s the chosen carriage for the god of gods. And it’s regal in size, said to be capable of producing clouds that send rain to earth.
In the modern world, Hadoop, with its yellow elephant for a mascot, is trying to become much the same. Hadoop has opened the doors to applications that must handle extremely high volumes of data across hundreds or thousands of clusters to generate some very valuable insights on which today’s businesses depend in their quest to stay competitive and drive revenue. And the beauty of it all is that an application can store any kind of data, whether structured or unstructured or semi-structured, from various devices, sensors, social channels and so on. Since the inception of Hadoop in 2006, several Apache projects have used advanced programming models for parallel processing (Map Reduce), resource managers (YARN) and other related projects (Ambari, Cassandra, HBase, Hive, Pig, Spark and the like) to truly exploit the power of Hadoop.
But there’s more to the story. Indeed, we’ve only just begun.
The weakest link
Every so often, new startups offer solutions for each specific stage of optimizing a big data application life cycle, whether data collection, data aggregation, data ingesting or ETL—all the way to being able to derive actionable insights. Venture capitalists have poured millions of dollars into Hadoop startups offering lucrative promises, but these startups are reaching a tipping point. Why?
Extracting insights from Hadoop-based frameworks requires highly specialized programming models and information governance, such as security and data life cycle management, to truly generate meaningful insights out of the data, sometimes in real time—so that, for example, a doctor is able to generate a diagnosis and treatment plans while a patient waits, or a bank can detect fraud even as it occurs. Though Apache Hadoop provides a scalable and reliable big data solution, most enterprises will need tools and interfaces to take full advantage of it. Hence technology solution providers, including IBM, are attempting to bridge the gap to make Apache Hadoop easier for enterprise adoption.
Derived value for today’s businesses
The problem most organizations face today is how to quickly derive value from their clusters so that teams of data scientists and engineers and line of business leaders can use it to aid decision making. The solution is twofold:
- A robust enterprise Hadoop framework should encompass the tools and technologies needed to support analysis of really messy data fueled by open innovation. The real value of open source for businesses lies in rapid innovation driven by millions of developers around the world and by companies working collaboratively to address the growing needs of dynamic, fast-paced technological growth. For example, IBM has invested more than a billion dollars in development efforts to help grow Linux and build Linux support into all its hardware (IBM Power Systems, IBM z Systems) and software offerings (IBM Open Platform for Hadoop).
- A cost-effective, high-performance, reliable and agile IT infrastructure should leverage unique data assets to help deliver optimal business outcomes. With the right infrastructure in place, intelligence into operational events and transactions can allow optimization of decisions in real time, matching manufacturing output with demand, lowering business risk and personalizing the customer experience at the point of sale.
According to a study conducted by the IBM Center for Applied Insights, although customer-centric objectives are still most organizations’ primary focus, more organizations are starting to integrate big data technologies into back-office and operational processes. Being among the first to spot new trends in the market or preventing operational downtime can facilitate growth. In particular, this can be achieved by ensuring a fully optimized end-to-end stack offering efficiency, high performance, low latency, security and resiliency built in at every level, including software, middleware and hardware.
A recent report by Cabot Partners states, “Clients who invest in IBM Power Systems for Big Data Analytics could lower the total cost of ownership with fewer more reliable servers compared to x86 alternatives. But more importantly, these customers will also benefit from the high value delivered by the growing open ecosystem of IBM Partners (OpenPOWER Foundation) and game-changing innovations such as Coherence Attach Processor Interface (CAPI).” A storage environment that is scalable to fluctuating client needs (for example, the IBM Spectrum Scale file management solution, part of the IBM FlashSystems Spectrum Storage family) and facilitates real-time data insights can further augment the underlying environment and hasten time to value and time to market.
In preparing organization-wide strategy through robust application design, integration frameworks and a strong team of data scientists and analysts, pay close attention to the infrastructure on which business insights and outcomes depend, as well as the total cost of ownership.
IBM Power Systems: Open Innovation for waitless insights and business-agile infrastructure
IBM Spectrum Storage: Simplifying storage to speed data driven innovation for the cloud era