Blogs

Post a Comment

The Hadoop data warehouse – A wake-up call for traditional EDW

August 21, 2013

The era of “big data” represents new challenges to businesses. Incoming data volumes are exploding in complexity, variety, speed and volume, while legacy tools have not kept pace. In recent years, a new tool – Apache Hadoop – has appeared on the scene. With the rise of big data has come the rise of the analytic database platform. Few years ago, a company could leverage a traditional DBMS for a data warehouse. However, enterprise data warehouse (EDW) concept was originally developed in a time when databases rarely exceeded a few TB in size.

According to a new market report published by Transparency Market Research, "Hadoop Market - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast, 2012- 2018," the global Hadoop market was worth USD 1.5 billion in 2012 and is expected to reach USD 20.9 billion in 2018, growing at a CAGR of 54.7% from 2012 to 2018. North America was the largest market for Hadoop in 2012 due to huge amounts of data generated in the region and the growing need to store and process the accumulated data.

A big data solution is not a single product, but architecture suitable for today’s business need. IBM is moving from the consolidated architecture to zone architecture. This type of architecture is much more modular, meaning instead of one large data repository, data can be stored and analyzed in smaller, more specialized systems that are built for specific functions.

Pramanick-hadoop-edw.jpg

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. The downside of many EDWs based on a RDBMS approach is that they become rigid, and it can take weeks or months for an IT/IS organization to add new data sources and change existing rules. Hadoop is a popular open-source MapReduce implementation which is being used in many companies now to store and process extremely large data sets on commodity hardware. In the new big data framework, you can do truly an agile and iterative development of an analytical and business intelligence solution. Hadoop allows storing data on a massive scale at low cost, handling the variety, complexity and change much more easily as one doesn’t have to conform all the data to a predefined schema like star schema or snowflake schema.

The change to augment the traditional EDW is happening gradually in the industry as big data technology matures and more solutions get added to fill the gaps that exist today. There are now several Hadoop appliances on the market, including IBM PureData for Hadoop.

Make sure you have all the tools to do the job: Solving big data analytic challenges requires a complete ecosystem.

Related resources