Hadoop Summit 2015, Day 1: Innovation thrives as Hadoop nears maturity

Big Data Evangelist, IBM

If you can’t attend next week’s Spark Summit in San Francisco, California, you could still learn a lot about Spark at Hadoop Summit, which is taking place this week in San Jose, California. We count 13 sessions at Hadoop Summit that have Spark in their title. And if our session-going experiences on Day 1, Tuesday, June 9, were any indication, Spark is a main topic of discussion in many of the others.

But we would be exaggerating if we said that Spark dominates discussions at this year’s Hadoop Summit. Indeed, Spark is just one of many Apache projects that is the subject of sessions at Hadoop Summit this year. Others include Hbase, YARN, Tez, Mesos, Flink, Drill, Oozie, Flume, Sqoop, Phoenix and Kylin. if Spark is just one of many themes, it is also not the dominant one in vendor announcements at this year’s Hadoop Summit. For example, IBM—along with Hortonworks and Pivotal—is emphasizing Open Data Platform support in its Hadoop product, BigInsights (which was enhanced to version 4.0 several months ago and now includes Spark). Moreover, IBM is stressing the importance of being able to deploy its Hadoop offering on a wide range of platforms. Such Hadoop deployment options include cloud service (BlueMix), bare metal (SoftLayer), mainframe (zOS) systems, Intel systems and Power Systems.

Regarding that last point, IBM is positioning Hadoop on Power Systems—due for Q3 release for BigInsights 4.0—as offering better performance, more efficient storage, superior processor use and quicker time to value than Hadoop on x86 servers. BigInsights 4.0’s Spark functionality is designed to benefit from that improved hardware use.

Other established Hadoop vendors—most notably, Hortonworks, MapR and Pentaho—also announced significant feature enhancements for their respective Hadoop solutions. Each of those vendors, like IBM, incorporates Spark features into its respective Hadoop platform enhancements. For details on these and other, less well-known vendors present at the event, read the SD Times news roundup.

Beyond product announcements and the growing industry focus on Spark, we found several other Hadoop industry themes discussed at the Hadoop Summit fascinating. One speaker from Teradata discussed the relevance of “Lambda Architecture”—the seamless blending of batch and streaming data—to data lakes. A speaker from Key2 Consulting discussed the importance of hybrid data architectures that integrate Hadoop with data warehouses. And someone from eBay gave an overview of Apache Kylin, an incubator project for OLAP cubing that leverages core Hadoop components: HDFS, MapReduce, Hive and Hbase.

It’s clear that Hadoop is nearing maturity, but if this year’s summit is any indication, this segment remains vibrant and innovative. Indeed, many of the sessions addressed significant gaps in our own knowledge of this fast-moving space. We can’t wait for Day 2.

If you want to dive deeper on Spark, register to join us next week at Spark Summit in San Francisco.

Join fellow data scientists June 15, 2015 at Galvanize, San Francisco for a Spark community event. Hear how IBM and Spark are changing data science and propelling the insight economy. Sign up to attend in person, or watch the livestream and sign up for a reminder to receive notification on the day of the event.

Co-authored by Andrew Popp, Portfolio Marketing, Hadoop & BigInsights, IBM Analytics Platform.