Putting data to work at Strata + Hadoop World 2016

Big Data Evangelist, IBM

Cognitive business took a bold leap forward in New York City the week of 26 September 2016. At two events on Manhattan’s west side, IBM led customers, partners and industry at large in an exploration of how to put machine learning, artificial intelligence (AI), and big data analytics to work.

Advanced environment

At the extremely well-attended IBM DataFirst Launch Event at Hudson Mercantile, the chief news was the announcement of Project DataWorks. This new, cloud-based offering provides a self-service environment for teams of data scientists, data engineers and other professionals to collaboratively develop, iterate and deploy sophisticated AI, cognitive computing, machine learning and other advanced analytics. Check out what Dinesh Nirmal, vice president, IBM Analytics, had to say about DataWorks in action during the Strata + Hadoop World 2016 conference.

And at nearby Strata + Hadoop World 2016, IBM further amplified, deepened and demonstrated the power of DataWorks, the new Data Science Experience and other innovative solutions to drive the productivity of data science teams working on complex initiatives.


Strata highlights

At the Strata conference, IBM also announced the release of IBM Big SQL on Hortonworks Data Platform (HDP), which you can read more about in Andrea Braida’s detailed blog. Braida explains how this new SQL-on-Apache Hadoop offering adds value to IBM BigInsights and carries forward our commitment to the Open Data Platform initiative (ODPi). In addition, check out the Cube livestream playback of remarks on Big SQL on HDP by Berni Schiefer, a fellow in the IBM Spark Technology Center.

Rob Thomas, IBM Analytics vice president for product development, did a keynote on the topic of how successful modern businesses think data first. Thomas provided a quick demo of the power of the new IBM Project DataWorks and Data Science Experience to solve real problems with real data. Also see Thomas’s related remarks on the Cube in conjunction with the DataFirst launch event.


In a breakout session, Raj Krishnamurthy presented an in-depth dissection of techniques for tuning Apache Spark machine-learning workloads. He discussed how Spark’s efficiency and speed helps reduce the cost of running existing clusters. Krishnamurthy illustrated how Spark’s performance advantages can allow it to complete processing in significantly shorter batch windows with higher performance per dollar. And he walked through an alternating least squares-based matrix factorization workload able to improve runtimes.


In another breakout session, Schiefer discussed ODPi as a foundation for cross-distribution Hadoop interoperability. He described how, with so much variance across Hadoop distributions, ODPi was established to create standards for both Hadoop components and testing applications on those components. He explored how application developers and companies considering Hadoop can benefit from ODPi. 

Holden Karau, an IBM software development engineer, and Seth Hendrickson, an IBM data scientist, presented the basics on Spark Structured Streaming for machine learning. They demonstrate how to do streaming machine learning using Spark’s new Structured Streaming and walked their audience through the process of creating their own streaming models. They also covered how to use structured machine learning algorithms, as well as Spark’s Structured Streaming application programming interface (API) and how machine learning works in Spark. Also, check out Karau’s recent Spark Technology Center blog on this topic.

Booth buzz

The IBM booth was also buzzing with intelligent activity at Strata + Hadoop World. For example, Marvin the Robot drove the cognitive Rock, Paper, Scissors Grand Challenge by exercising his data-driven algorithmic intelligence to the delight of conference goers. Learn more about Marvin, the challenge, and the enabling technology.


For more information and perspective on these announcements, see these digital resources: