Highlights from day two of Strata + Hadoop World 2015
Day two (Tuesday, 30 September) of Strata + Hadoop World was essentially day one. By that, I mean it was the first day of keynotes, in a plenary session that stretched for two hours. The keynotes took place in Javits North, a vaulted auditorium with exposed conduits that felt a bit like an airplane hangar on the banks of the Hudson River.
This plenary session was quite stimulating. We heard a steady succession of speakers from O’Reilly (the conference organizer), Cloudera, the University of St. Thomas, Microsoft, Intel, ClearStory Data, Cisco, AudioCommon, BBC Worldwide, and the White House Office of Science and Technology Policy.
And, of course, IBM keynoted, in the person of our jolly good fellow and ironman chief scientist of context computing Jeff Jonas. If you’ve seen Jeff at IBM Insight or other events, you know that he’s a powerful thinker, a compelling evangelist and a lot of fun. As always, he went deep on cognitive computing, algorithmic sensemaking, entity resolution and unstructured data analytics.
Though I’ve seen Jeff cover this territory many times, what I love about his talks is how he always has fresh examples drawn from his personal life, his research, and his work with IBM customers and partners. In the day two keynote, he likened one of his core focus areas, “context accumulation,” to jigsaw puzzles (which he’s done before).
This time, he illustrated the metaphor literally with a slideshow of his family doing an actual jigsaw puzzle challenge that he concocted. The principle he drew from this example—incremental discovery through accumulation of fresh context—drove his discussion home for the attendees.
But then he took his context-computing discussion to a literally cosmic plane. He described how the technology is being used to discover collisions among asteroids. Reinforcing his subject in a surprising way, he described the technology’s potential to discover previously unknown asteroids, including ones that may impact our planet. As Jeff put it, “I just want to say: if we save Earth, you owe us.”
After this, he brought the discussion back to important (but not literally Earth-shattering) applications in marketing, life sciences and other practical areas. Plus, he put in a quick plug for the debut, 10–12 November in San Francisco, of the Datapalooza series of data-scientist community events.
Here are my paraphrases of some of Randy’s key statements:
- IBM has upped its investment in Linux, Spark and other open technologies.
- We’re focused on creating value on top of open platforms.
- The biggest thing about Spark is its in-memory design, which helps avoid disk I/O bottlenecks.
- Spark can create 10x performance improvements on pipelines of work.
- Users can experiment with Spark’s system design on Hadoop clusters.
- The POWER8 value proposition is that it’s designed for big data: large caches, huge memory bandwidth, an in-memory architecture, support for more cores and threads, distributed pipeline execution, aggregation of data and parallelization of queries across rows.
Stay tuned for my recap of day three.
To accelerate your career journey into advanced analytics and Hadoop, you can explore this informational Hadoop resource page at IBM Analytics. And be sure to register for IBM Insight 2015, 25–29 October, in Las Vegas.