Highlights from day two of Strata + Hadoop World 2015

Big Data Evangelist, IBM

Day two (Tuesday, 30 September) of Strata + Hadoop World was essentially day one. By that, I mean it was the first day of keynotes, in a plenary session that stretched for two hours. The keynotes took place in Javits North, a vaulted auditorium with exposed conduits that felt a bit like an airplane hangar on the banks of the Hudson River.

This plenary session was quite stimulating. We heard a steady succession of speakers from O’Reilly (the conference organizer), Cloudera, the University of St. Thomas, Microsoft, Intel, ClearStory Data, Cisco, AudioCommon, BBC Worldwide, and the White House Office of Science and Technology Policy.

And, of course, IBM keynoted, in the person of our jolly good fellow and ironman chief scientist of context computing Jeff Jonas. If you’ve seen Jeff at IBM Insight or other events, you know that hes a powerful thinker, a compelling evangelist and a lot of fun. As always, he went deep on cognitive computing, algorithmic sensemaking, entity resolution and unstructured data analytics.

Though Ive seen Jeff cover this territory many times, what I love about his talks is how he always has fresh examples drawn from his personal life, his research, and his work with IBM customers and partners. In the day two keynote, he likened one of his core focus areas, context accumulation, to jigsaw puzzles (which hes done before).

This time, he illustrated the metaphor literally with a slideshow of his family doing an actual jigsaw puzzle challenge that he concocted. The principle he drew from this example—incremental discovery through accumulation of fresh context—drove his discussion home for the attendees.

But then he took his context-computing discussion to a literally cosmic plane. He described how the technology is being used to discover collisions among asteroids. Reinforcing his subject in a surprising way, he described the technology’s potential to discover previously unknown asteroids, including ones that may impact our planet. As Jeff put it, “I just want to say: if we save Earth, you owe us.”

After this, he brought the discussion back to important (but not literally Earth-shattering) applications in marketing, life sciences and other practical areas. Plus, he put in a quick plug for the debut, 10–12 November in San Francisco, of the Datapalooza series of data-scientist community events.

In the afternoon, IBM distinguished engineer Randy Swanberg sat in the interviewee chair on theCube, discussing IBM’s investment in Spark and its development of POWER8 processor technology.

Here are my paraphrases of some of Randy’s key statements:

  • IBM has upped its investment in Linux, Spark and other open technologies.
  • We’re focused on creating value on top of open platforms.
  • The biggest thing about Spark is its in-memory design, which helps avoid disk I/O bottlenecks.
  • Spark can create 10x performance improvements on pipelines of work.
  • Users can experiment with Spark’s system design on Hadoop clusters.
  • The POWER8 value proposition is that it’s designed for big data: large caches, huge memory bandwidth, an in-memory architecture, support for more cores and threads, distributed pipeline execution, aggregation of data and parallelization of queries across rows.

Stay tuned for my recap of day three.

To accelerate your career journey into advanced analytics and Hadoop, you can explore this informational Hadoop resource page at IBM Analytics. And be sure to register for IBM Insight 2015, 25–29 October, in Las Vegas.