Hadoop: Opening insights everywhere

Portfolio Marketing, Hadoop/BigInsights, IBM Analytics

Big data is a dynamo with the power to drive every aspect of modern life. Harnessed through advanced analytics and delivered across cloud-based systems, social media, mobility infrastructures and the Internet of Things, big data is a source of constant illumination and insight in every business and in a growing range of consumer applications.

This transformation in the fabric of modern life has accelerated in recent years. In the past—in other words, as recently as the beginning of the current decade—businesses ran the bulk of their analytics on such platforms as data warehouses, data marts and general-purpose database management systems (DBMSs). Before the advent of true big data platforms in the business mainstream, data analytics professionals worked in different conditions from those conditions they encounter today.

Previously, ingesting fresh data into databases was an expensive process. Data needed to be cleaned up first before it arrived in a central repository, such as an enterprise data warehouse (EDW). Businesses could trust this data because it came from a thorough extract-transform-load (ETL) process, and it was stored in a single repository. But today, organizations are storing everything, and even more data silos than ever exist. Data sprawl continues to grow within a data lake. data for business value

The term big data describes a way to look at new sources and the approach that organizations take in placing that information under management. Big data has attracted a wide range of application innovators and many folks just looking for an alternative to relational databases and data warehouses.

The rising volumes of data and fast-changing customer needs have led many companies to a realization: they must constantly improve their capabilities, competencies and culture to turn data into business value. When considering an Apache Hadoop solution, organizations need to really consider a number of key characteristics so that they can truly maximize their big data potential.


A few years back, Hadoop was essentially MapReduce, a batch-oriented system for processing large amounts of data. Through the last few years, lots of good stuff has been added to the core to build significant robustness that frankly makes Hadoop a better fit for the enterprise. Because of these additions, Hadoop is taking on all sorts of capabilities that were considered impossible just two years ago.

The current way of developing Hadoop applications is too slow and fragmented, and it’s plagued by duplicated efforts. Service providers and customers need to verify against multiple distributions and even between multiple versions of the same distributions. The open data platform focuses on eliminating this effort by centralizing and creating a test-once standard that takes the guesswork out of developing software for Hadoop. And in so doing it thereby frees enterprises to build business applications on the platform instead of endlessly stitching code together and fixing it when it breaks.


CEOs and decision makers in organizations continuously acknowledge that applying analytics to their business and making analytics-based decisions are key goals for them and their organizations. Big data analytics is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better-informed decisions.

With big data analytics, data scientists and others can analyze huge volumes of data that conventional analytics and business intelligence solutions can’t touch. Consider that the organization could accumulate—if it hasn’t already—billions of rows of data with hundreds of millions of data combinations in multiple data stores and abundant formats. High-performance analytics is necessary to process that much data to figure out what is and what isn’t important. Enter big data analytics.

Making forward-looking, proactive decisions requires proactive big data analytics such as forecasting, optimization, predictive modeling, statistical analysis and text mining. It allows organizations to identify trends, spot weaknesses or determine conditions for making decisions about the future. But although it’s proactive, big data analytics cannot be performed on big data because traditional storage environments and processing times cannot keep up.


Many organizations are aware that technology can transform the way they do business. In the fast-moving world of today, the hard part often is first understanding which technologies do what before we even think about incorporating them in our work lives.

Cloud technology has come of age; it’s no longer a question of if organizations are moving to the cloud but when and how. In the beginning, cloud computing was a great resource to test and develop new products. Cloud-based technology offered the pay-as-you-go opportunity to expand and experiment without the need for costly IT investment. But times and customer requirements have changed, and cloud technology is entering a new age of definition. Cloud computing is essentially the commodification of computing time and data storage by means of standardized technologies.

Hadoop is a prime technology that can clearly benefit from this commodification; sure, cloud-based deployments of Hadoop benefit from a number of the standard cloud benefits such as cost savings, elasticity and so on. Perhaps the biggest benefit of cloud-based technology for Hadoop is simplification. Turnkey access to clusters, naturally distributed architecture and compute power ready to churn through data can free administrators from potentially complicated setup processes and time.

Rich content for advanced business fundamentals

With all of the foregoing in mind, you are encouraged to attend Strata+Hadoop World 2015, September 29 to October 1, 2015, in New York, New York. As in previous years, IBM will be at this exciting, content-rich industry forum in force. This event is where leading-edge science and advanced business fundamentals intersect and merge. It offers a deep-immersion experience. Analysts, data scientists and executives can get up to speed on emerging techniques and technologies by dissecting case studies, developing new skills through in-depth tutorials, sharing best practices in data science and imagining the future.

In addition, learn more about how Hadoop-based data science is driving innovation in the 21st century and the IBM Open Platform.