Highlights from day one of Strata + Hadoop World 2015

Big Data Evangelist, IBM

IBM is all over this year’s Strata + Hadoop World 2015 in New York, New York, which stands to reason, considering that it is a strategic sponsor of the event. Day one on Tuesday, 29 September 2015, was a buzz of IBM activity. IBM data science experts started their three-day hands-on lab course, “Practical Data Science on Hadoop.” IBM spokespeople were discussing IBM offerings and strategies. And the IBM pedestal was hopping with activity during the opening reception that evening. And here was IBM’s intrepid big data evangelist, ever the early bird, in the lobby of the Javits Center at the crack of dawn.


Throughout the first day, Brandon Mackenzie, John Rollins and Jacques Roy of IBM guided a roomful of attentive students in hands-on data science lab exercises. Focusing on the Apache Spark machine learning library (MLlib) and System ML, the day one curriculum focused on fundamental data science methodology, with an overview of selected machine learning methods, descriptive statistics and other useful guidance. 

John (seen below) authored a recent IBM Big Data & Analytics Hub blog on data science methodology.

And Roy authored a recent IBM Big Data & Analytics Hub blog posting on classification and text analytics technology.

On the morning of day one, Wikibon began streaming its theCube interviews. The first IBM interview had John Furrier of Wikibon interviewing both Rob Thomas, vice president, product development, IBM analytics (right), and Joel Horwitz, director of portfolio marketing, big data and analytics, at IBM (left).


Thomas and Horwitz answered the questions Furrier lobbed their way in tag team fashion. Here are a few highlights of their responses:

Rob Thomas:

  • Hadoop is about storage. Spark is about analytics. Spark moves you faster along that maturity curve.”
  • “The value in line-of-business analytics comes from machine learning. That was why we contributed System ML to open source.”
  • “We’ve released a set of SPSS analytics algorithms that now run directly on Spark. Thousands of data scientists that know SPSS really well can run those algorithms on Spark, which opens up a corpus of data for them to work on.”
  • “We’re making a couple of big announcements this week with new products: BigIntegrate and BigQuality. These are data ingestion engines that change how you get the right data into Apache Hadoop. BigIntegrate and BigQuality equal big cleanup.”
  • “We’re number one in Spark. Nobody’s made an investment that even touches what we’re doing with Spark, and in terms of the traction we have with clients, it’s enormous.”

Joel Horwitz:

  • “Hadoop is becoming more of a storage environment, though people sometimes misname how they call Hadoop. It’s actually a much broader ecosystem. It’s not just a file system, it’s a full ecosystem of other capabilities. Spark is actually accelerating Hadoop.”
  • “We opened Watson in the Valley. I think that’s telling. There’s a lot of innovation there. It’s the epicenter of Spark.”
  • “Our focus is in making really strong contributions and deep investments in the developer community and open source community but at the same time continuing to work hand in hand with our installed base.”

 In the afternoon, Dave Vellante and George Gilbert at Wikibon interviewed Larry Weber, program director for dashDB portfolio marketing, IBM Cloud Data Services.

Here are a few snippets of Weber’s interview responses:

  • “From a data perspective, we’re going to have our own division wrapped around Cloudant data services, composable data services, not just a database. But it could be anything from Cloudant, it could be dashDB, it could be BigInsights or Hadoop; we even have Spark in there as a beta. We’re doing this around developers.”
  • “We’ve seen a number of use cases around data warehousing. A lot of people say ‘I have an existing data warehouse; I need to do something today.’ There’s a lot of cloud traction here—get answers faster.” 

During the opening reception in the evening of day one, I had a chance to meet Robert Routzahn at the IBM pedestal, where he was discussing the recently released products that Thomas mentioned: IBM InfoSphere BigInsights BigIntegrate and InfoSphere BigInsights BigQuality Version 11.5. For more information on this topic, check out Routzahn’s recent blog  posting at the IBM Big Data & Analytics Hub. 

The always engaging Andrew Popp, manager, portfolio marketing for Hadoop and BigInsights, IBM analytics, at IBM, was also at the IBM pedestal. For a good overview of the IBM strategy in the Hadoop market, read Popp’s recent IBM Big Data & Analytics Hub blog posting, “Hadoop: Opening insights everywhere.” And here’s a fun one from Popp, as well, in the form of a quiz for users called “How do you Hadoop?” And I’d be remiss if I didn’t mention my latest take on the evolving role of Hadoop in the era of Spark

That covers day one at Strata + Hadoop World 2015. Look for recaps of days two and three. In the meantime, learn more about how to accelerate your career journey into advanced analytics and Hadoop, and be sure to register for IBM Insight 2015, 25–29 October 2015, in Las Vegas, Nevada.