Big Data Integration
Given the explosion in the volume, variety and velocity of data growth, it is clear that big data has a low value per byte compared to the traditional enterprise data. An oft repeated analogy to this is a gold mine where you dig tons of dirt to discover an ounce of gold. But, enterprises can still derive superior insights from big data. The question here is: how do we process big data and derive insights at a lower cost?
One way to keep the cost of processing low is to have different cleansing, transformation and governance procedures for big data, as compared to traditional data. This means that the two kinds of data will differ in quality and usability for decision making. Also, with Hadoop emerging as a basis for big data platforms, enterprises now have traditional systems (for example: transactional systems, data warehouses and data marts) and big data technologies, coexist in their eco-system.
The authors of the eBook - Understanding Big Data recommend that the enterprises should get the traditional systems and big data technologies working together, because this ecosystem delivers superior value. A simple example for superior value is when you augment your understanding of a product sales decline or a production quality issue by looking at the consumer sentiment. Without doubt, the two systems complement each other.
Watch this video as Sriram Padmanabhan, an IBM Distinguished Engineer, explains how you can deliver transactional data to a Hadoop platform (IBM InfoSphere BigInsights) using IBM InfoSphere Information Server.
As enterprises embark on their big data integration journey, they need to overcome a few challenges like:
- The technical skills needed to handle big data
- Identifying a business problem that can be solved using insights from big data
- Building a business case for management buy-in, given the exploratory nature of big data analytics
- Governance framework for data that is diverse and voluminous compared to enterprise data
The list goes on.
This post was originally published on October 5, 2012 on the Mastering Data Management Blog.