Blogs

Post a Comment

Which Big Data Sources can IBM Business Intelligence Access?

March 5, 2013

One of the most frequently asked questions I receive is this: “Which Big Data sources can Cognos Business Intelligence (BI) access?”

QMark.pngThe short answer is IBM BigInsights, IBM PureData Systems for Analytics (formerly Netezza), EMC Greenplum, Teradata Aster Data, ParAccel and Vertica. Version numbers and such are posted on the Cognos Supported Software Environments page.

An adaptor for Apache Hive is in beta as of Feb. 25th, 2013. This adaptor supports Hive versions 0.8 and 0.9 from the open source project and has successfully connected to those releases of Cloudera, AWS EMR and Hortonworks which bundle these versions of Apache Hive.

The longer answer is “It depends on what you’re trying to do.”

Business Intelligence (BI) represents a group of capabilities. Although for some people, BI is little more than the Excel or PDF attachment on an email they receive in their inbox every Monday morning, for the people creating BI deliverables, BI is a range of capabilities from reporting and analysis to performance management to mobile access.

Hadoop-based sources are best suited for batch-oriented queries. Our testing indicates that Hive queries typically take 10 seconds more than the same query executed by an RDBMS. When queries span physical sources, such as joining the Hadoop data with data warehouse data, all that data is retrieved, joined and then processed locally on the Cognos server. This step adds more time.

From a BI perspective, this query characteristic best fits with reporting and delivering those PDFs and Excel files that end users love.

Here’s the challenge: Batch processing often doesn’t provide the performance desired for interactive queries. Performance will depend on the type and volume of data being processed. It’s why contributions like Dremel or Impala to the Apache project are so exciting. However, the technology will take time to mature.

Until then, when interactive exploration or analytics is needed, data staging will still be needed. Consider analytics-oriented, high-volume databases such as IBM PureData for this purpose.

What other questions do you have?


Related Information