Building a strong foundation for Hadoop with the Open Data Platform initiative

Big Data Evangelist, IBM

Hadoop is on the cusp of maturity as an industrywide big data analytics platform.

Hadoop’s commercial maturation took a big leap forward with the recent establishment of the Open Data Platform (ODP) group, which has created a common interoperability framework. ODP provides users and ISVs with assurances that there is a tested Hadoop core, allowing them to focus on building value-added applications on top.

In the months since ODP was announced, the initiative has quickly catalyzed the Hadoop market around the vision of cross-vendor interoperability. A meetup at this week’s Hadoop Summit 2015 illustrated how many open issues remain regarding ODP’s value, scope, adoption, roadmap and core components, among other things. The event took place on Monday evening, June 8, at the San Jose Convention Center, where participants listened to several ODP founding members discuss the initiative’s genesis and likely evolution. The audience had many insightful comments for the panelists and asked tough questions of them.

Pictured (L–R): Scott Gray, IBM; Roman Shaposhnik, Pivotal; Scott Andress, Hortonworks

Scott Andress, Senior Director of Strategic Alliances at Hortonworks, emphasized that ISV enablement is critical to the continued growth of the Hadoop market. He asked how to define core ODP components that make it easy for ISVs to validate their applications once and run them across all Hadoop distributions.

Roman Shaposhnik, Director of Open Source at Pivotal, said that lessons from the Apache BigTop initiative—which has failed to gain significant industry traction as a framework for packaging, testing and configuring leading open-source big data components—can help “calibrate expectations” for ODP’s success. Moreover, he said, ODP addresses the need for a coordinated effort across the Hadoop industry to avoid incompatible distributions. He went on to say that a comprehensive Hadoop test suite would be an appropriate software artifact to include in ODP.

Scott Gray, IBM Big Data Architect and Senior Technical Staff Member at IBM, called ODP very much a work in progress, saying that “we have to walk before we can run” with comprehensive applications. He described ODP’s current scope as “fairly limited,” saying that it hasn’t yet evolved beyond its core of HDFS, MapReduce and Ambari. However, he said, it will indeed grow, adding components with strong version management and a diverse collection of programming interfaces. Gray said that ODP will evolve into a set of software artifacts and reference implementations in which “everything will be as open as possible.” All the while, he added, ODP will continue to contribute back to Apache Software Foundation (ASF) Hadoop community projects, for “ASF projects are the starting point for ODP initiatives.”

Toward the end of the meetup, the talk turned to the future of ODP, and the panelists asked attendees to name the projects they want added to the ODP core. The responses were varied and included HBase, Hive, Sqoop, Oozie, Spark, Zookeeper, Kafka, Storm and Kerberos.

For more about one component of that ODP wishlist, register to join us next week at Spark Summit in San Francisco.

Join fellow data scientists June 15, 2015 at Galvanize, San Francisco for a Spark community event. Hear how IBM and Spark are changing data science and propelling the insight economy. Sign up to attend in person, or watch the livestream and sign up for a reminder to receive notification on the day of the event.

Co-authored by Andrew Popp, Portfolio Marketing for Hadoop/BigInsights, IBM Analytics Platform.