Spark Summit 2015, Day 1: Energizing a new wave of data scientists

Big Data Evangelist, IBM

It’s Tuesday, June 16, as I write this. Right after grabbing a coffee at Starbucks, I checked the morning paper to see whether there was any news to report yesterday other than the first day of Spark Summit 2015.

In case you were wondering, there was. But if you, like me, were among the packed crowds at the summit or, later in the day, at the IBM-sponsored Spark community event nearby, you probably weren’t paying attention to anything else. It was clearly a landmark day in the steady development of Spark as an enterprise-grade big data analytics technology. Take a look at the Wall Street Journal’s coverage, for example. the high points on what IBM announced yesterday, you should check out the press release, or perhaps this blog post of mine published a few hours later. To place those announcements in the broader context of IBM’s strategic focus on Spark, check out our new Spark landing page. And if you want to dive right in and join the community of Spark-using data scientists we’re cultivating at the new Spark Technology Center, please register here.

But keep in mind that Spark is much larger than IBM. In fact, IBM’s announcements yesterday focused on its deepening engagement with the Spark open-source community and with the growing ecosystem of Spark solution providers, developers and educators, among others. In fact, I also published a blog post yesterday highlighting the deepening pool of strategic Spark partners that IBM has already acquired.

Most of them were—in fact, it seemed as if much of the global Spark community was—in San Francisco yesterday for the summit, which took place at the Hilton Union Square. IBM was one of several sponsors of the event, and Beth Smith, General Manager of IBM Analytics Platforms, was one of the keynote speakers. Beth not only presented the crux of yesterday’s IBM announcements, but also placed the development of the Spark open-source platform in its broader historical context. She drew parallels with the development of Linux more than two decades ago and characterized Spark as the foundation of the new “analytic operating system” in a world moving inexorably toward in-memory, streaming, machine learning and graph analytics.

But before Beth took the stage at the summit, the main keynote speakers—Matei Zaharia and Patrick Wendell from Databricks—presented the metrics of the Spark community’s growth and adoption for the past few years. Their discussion corroborated not only what IBM has been saying about Spark’s developing into the most significant open-source project of the next 10 years, but also what everybody present could plainly see. A standing room–only crowd filled the Hilton’s Grand Ballroom, which was buzzing with the passion of a new generation of data scientists who have standardized on Spark as the power tool for their most challenging projects. In fact, about 85 percent of the attendees were either data scientists or application developers who are starting to bring the tools of data science into their projects.

After lunch, sadly to say, I had to leave the summit to begin setting up for the IBM-sponsored Spark Community Event several blocks away. Held at the San Francisco campus of IBM partner Galvanize, the event’s streaming backbone was SiliconAngleTV’s theCUBE interviews with IBMers and partners who elucidated the significance, value and challenges associated with Spark, which by all accounts is a very promising but still immature technology. Here are links to the interviews in the order in which they were streamed: myself, Beth Smith, Joel Horwitz, Harriet Fryman, Mike Tamir, Fernando Perez, Robert Parkin, David Townsend, Rod Smith and Paco Nathan.

As the evening approached and Spark Summit wrapped up for the day, we began to wind down the SiliconAngleTV interviews as the community headed to Galvanize’s campus. Activities at the event included a Hall of Innovation featuring innovative Spark projects, IBM and partner Spark demonstration stations, lightning talks from several IBM partners and customers who have embraced Spark and a wide-ranging moderated town hall panel involving several of the Spark experts who had sat for theCUBE interviews earlier in the day. In addition, we recognized the top-scoring teams from the three-day IBM-sponsored Spark hackathon hosted at the Galvanize campus.

There was a palpable sense of excitement yesterday at the community event, the summit and IBM’s new Spark Technology Center at 425 Market Street, just a few blocks from the Embarcadero. The depth of the Spark brain trust—within IBM, across IBM’s partner ecosystem and among IBM’s worldwide customer community—will continue to grow, and the Spark Technology Center (STC) has established a blog that we hope will capture the most visionary data science thinking and most exciting work that the new breed of data scientists and other developers are doing with the help of Spark.

We certainly hope that you will also tap into the deepening pool of Spark-related blogs, infographics and other content resources being published on IBM’s Big Data & Analytics Hub. As you’re probably aware by now (considering how much recent content we’ve published and promoted on Spark), we’re covering this hot technology from every possible angle—for example, see the crop of recent BD&A blogs on Spark. And we’re not going to slack off, either. A growing body of fresh thinking is coming down the pike. Much of it will come from the droves of IBMer data scientists who participated in the recent and wildly successful internal Hack Spark Challenge, as well as ongoing IBM-sponsored hackathons, meetups and developer days focusing on Spark. For example, here’s a recent blog post by Monica Fox that summarizes a Spark hackathon we held in Cambridge, Massachusetts, in late May.

Well, that’s it for my day 1 recap from Spark Summit. The day 2 keynote talks are about to begin. I’ll bring you all up to speed on those in tomorrow’s recap blog.

And before I forget (I can get a bit absent-minded on the road), I’d like to urge you to sign up for IBM’s forthcoming Apache Spark as a Service on Bluemix.