Blogs

Apache Spark and IBM Streams working together in streaming analytics

Executive IT Specialist, Competitive and Product Strategy, IBM Analytics, IBM

The annual IBM conference—now called IBM Insight at World of Watson 2016—is coming soon, and I want to attend so many of its sessions. But I want to specifically cover two sessions I am involved in—one is on data science and the other focuses on how Apache Spark and IBM Streams work together.

If you haven’t lived under a rock for the last few years, you surely know that data science is a hot topic. The session, “A Data Science Introduction for Database Girls and Guys,” takes place on Monday of the week of the conference—24–27 October 2016—and it is a good way to start the week. Despite all the press on it, data science is still complex subject matter with steep probability and statistics learning curves. How can we get into the subject without having to go back to college?

Taking the mystery out of data science

http://www.ibmbigdatahub.com/sites/default/files/sparkandstreams_embed.jpgThis presentation provides a methodology on how to approach data science projects and includes some fundamental concepts and useful examples to help demystify the field. Its goal is to give you enough information so you can be a valued member of a data science project team. It starts with a brief introduction of the data science methodology and draws parallels with the data management field. Some key terminology and concepts related to data science are covered, and basic statistics background is provided to help remove some of the magic around this subject.

And at a high level, the session covers some popular machine learning algorithms that are available as part of the Apache Spark machine learning library (MLlib) and other analytics engines. These engines include decision trees, k-means clustering, Naive Bayes, and the Alternating Least Squares method. This session helps you prepare for your journey into data science.

Driving streaming analytics

The other session, “Apache Spark and IBM Streams Working Together in Streaming Analytics,” takes place on Thursday morning the week of the conference, and it is related to the just-mentioned data science session. You may have heard some IBM executives describe Spark as the analytics operating system, and it is the centerpiece data science engine in many data science-related solutions.

Despite Spark’s hype, we have to understand that it is not the be-all and end-all of big data analytics. IBM Streams can complement Spark’s functionality and lead to a better overall solution, and this session spends some time explaining how. If you are not familiar with Spark and IBM Streams, this presentation covers brief introductions on Spark, Spark Streaming, and IBM Streams at a high level and more on the functional side.

How do Spark and Streams work together? How do Spark Streaming and Streams work together? You’ll learn how at this session. And even for those who are looking for something a bit more substantial, the presentation concludes with additional details on how to use a Naive Bayes model developed in Spark to score streaming documents.

You’ll surely benefit from attending these and many of the other great sessions at IBM Insight at World of Watson 2016. Register today for this conference, which represents the continuing evolution of the event formerly known as IBM Information On Demand and IBM Insight. I hope to see you there.