Blogs

The emergence of a diversified industry ecosystem around Spark

Post Comment
Big Data Evangelist, IBM

Apache Spark is at heart an open-source community, but it is going well beyond that identity to also develop into a substantial sector of the analytics market. However, Spark will not be able to achieve its full potential if a robust industry ecosystem does not develop around it.

http://www.ibmbigdatahub.com/sites/default/files/sparkecosystem_blog.jpgIBM is clearly in a strong position to catalyze the development of just such an ecosystem. Today’s announcements described how IBM is investing in growing a diversified ecosystem of partners in the Spark arena, bringing a clear focus on engaging with and contributing to the open-source community that drives evolution of this important codebase. In addition to publicly committing to ramping up IBM contributions to Apache Spark, today’s announcements identified the Spark Technology Center as the focus of Spark community, partner and customer engagement and launched an ambitious worldwide MOOC of Spark education for data scientists and engineers while also open-sourcing the IBM-developed SystemML machine-learning technology to drive innovations in the Spark community.

In building a robust Spark industry ecosystem, IBM is emphasizing a wide range of strategic partner engagements in this emerging market segment. Although Spark is available as a component of the Apache Hadoop distribution, the go-to vendors in the Spark market are largely a group of Spark-focused startups distinct from the partners with which IBM engages in the larger Hadoop arena. If last week’s Hadoop Summit is any indication, many of those vendors are also quickly getting fired up about Spark, so the partner ecosystems of the Hadoop market and Spark proper may someday completely overlap—though at this point in Spark’s commercial development, it’s still a niche market.

With that prelude, here’s a quick rundown of the Spark partners—more than a dozen of them—that figured into today’s announcement. IBM will be collaborating with these partners, working together to take advantage of opportunities in this exciting new market. If you attended today’s Spark community event, held at the facility of one such partner (Galvanize), you’ll notice that many of these partners are already in the Spark market, offering commercial solutions. All these partners are certified on, or are currently certifying on, IBM BigInsights 4.1—which includes Spark—and are committed to incorporating it into their various Spark product strategies. Moreover, these partners used BigInsights’ Spark implementation in their demos at today’s event.

Here’s a quick overview of how IBM’s Spark partnerships support its go-to-market in this emerging market.

To engage deeply in the Spark open-source community, IBM is partnering with Databricks, whose contributors include the originating team of Spark developers from UC Berkeley’s AMPLab as well as more Apache Spark committers than any other project in today’s market. Accordingly, IBM is partnering with Databricks on its own open-source contributions to the Spark community, especially those pertaining to the deepening of the machine learning libraries at the core of Apache Spark. Databricks will also help IBM ensure that it is providing clients with the latest Spark innovations and support from the community.

To offer customers a choice of Spark modeling and visualization tooling, IBM is supplementing its own offerings with high-quality third-party offerings. IBM’s partnerships include a mix of modeling tool vendors that specialize in Spark or that are also active in the broader Hadoop arena. Today’s announcement identified Alpine Data Labs, Arcadia Data, ClearStory Data, Datameer, Looker, Platfora and ZoomData as such partners. And IBM also partners with TypeSafe, which provides a rich application development platform for Scala programming in Spark.

To educate data scientists about Spark and provide training in its use, IBM has engaged AMPLab, DataCamp, Galvanize (the host for today’s community event), MetiStream and Silicon Valley Data Science. Moreover, one of IBM’s partners, Tupl, brings a specific vertical-industry focus on Spark, providing a next-generation machine learning application for cellular network performance and troubleshooting optimization.

And this is all just for starters. IBM is actively pursuing other partnerships to round out its total offering in the emerging Spark market. IBM takes very seriously the need to build an open Spark ecosystem that provides customers the option to use whichever IBM or partner offering best suits their specific requirements.

You can learn more about IBM’s deep commitment to Spark and engagement with its partner ecosystem by visiting the following online resources: