Hadoop: The well-oiled insight machine

Portfolio Marketing, Hadoop/BigInsights, IBM Analytics

How should you implement Hadoop in your enterprise? Specifically, how can you incorporate Hadoop into your broader data architecture while shortening time to value and delivering meaningful insights?

As organizations have successfully deployed IBM Open Platform (IOP) with Apache Hadoop, a pattern of success has emerged that can help you make the most of Hadoop, too. By using IOP with Apache Hadoop, in conjunction with IBM BigInsights, you put powerful analytical tools at your disposal—but there’s much more to it than that. As Figure 1 shows, success lies, in large part, in how you fit those pieces together:

Architectural view of Hadoop in the enterprise

Bring data from multiple sources to a single platform

Be sure to have one platform for managing all your data—there’s no point in having separate silos of data, each creating separate silos of insight. Yes, Hadoop is an excellent landing zone for capturing a variety of data in native format, but IBM customers are using that landing zone for more than just data ingestion and normalization—they are using it as a sandbox that allows developers to test their ideas, build models and generate data sets for analysis.

Sources of data can include everything from web pages to system logs. Many IBM customers pull data from real-time analytics zones containing streaming engines, or from warehousing zones containing traditional data warehouses and marts. Moreover, the BigInsights SQL interface allows the leveraging of relational database management systems (RDBMSs) such as you might already have in your environment. In short, you needn’t worry about where your data lives.

Bring the analytics to the data

A well-running Hadoop system requires access to powerful analytics to deliver meaningful insight, and BigInsights provides industry-leading analytics on top of the IOP architecture. These built-in analytic and data management technologies allow developers and analysts to explore raw data in the sandbox environment. To keep things running at high speed, this setup brings the analytics to the data instead of losing time by sending large data sets over the network to the analytics software. who take this approach can use analytics capabilities that include business intelligence (BI) and reporting, predictive analytics, visualization and discovery. Many IBM customers regard visualization as an essential tool for making big data valuable to business users. They also view predictive analytics as an essential part of optimizing the future by making smart decisions. IBM offers easy-to-use predictive analytics solutions designed to meet the needs of users who can range from beginners to experienced analysts. Moreover, many organizations have invested in traditional analytical and reporting tools used with relational database management systems, and such tools can work directly on BigInsights data as needed.

Another integral part of a successful Hadoop implementation is the metadata and governance zone—indeed, many organizations deploy such a zone for extract, transform and load (ETL) database functions, master data management (MDM) and data governance. IBM MDM and data governance solutions can help ensure that the data delivered to analytical and other applications is consolidated and reliable while also observing data retention policies. Think of it as providing high-grade fuel to your Hadoop powerhouse.

Bring your data architecture together

The IBM Open Platform can help you fit together all these components to provide a scalable, robust and well-oiled Hadoop system that you can run wherever you need it most—whether a data center or on a rack of servers. That should come as no surprise, for IBM’s work with Hadoop is an extension of its company history of contributing to and sustaining open-source projects. As a founding member of the Apache Software Foundation, and the first enterprise backer of Linux, IBM has designed its Open Data Platform to embody this commitment by supporting a completely open data platform designed to accelerate Hadoop deployments.

Explore IBM open source Hadoop to discover how it can fit smoothly into your environment. Get started by downloading IBM’s free and open-source Apache Hadoop distribution, along with a supported offering for your Hadoop workloads.

Additionally, use the IBM Analytics for Hadoop service on cloud to quickly access BigInsights capabilities. To find out more, discover how you can deploy and run BigInsights applications on a dedicated cluster with administrative access.