Blogs

Hear all about open source PixieDust at IBM Insight at World of Watson 2016

Architect, IBM Cloud Data Services, IBM

Jupyter Notebooks is a powerful environment for performing fast, flexible and interactive data analysis. Notebooks are rapidly becoming the tools of choice for data scientists and application developers. They are easy to work with and can connect to a variety of kernels and support multiple languages.

http://www.ibmbigdatahub.com/sites/default/files/pixiedust_embed.jpgI have been working with Apache Spark and Jupyter Notebooks during the past year, building various applications written in Python and Scala. Check them out on Github, and became a big fan.

As a developer advocate for IBM Cloud Data Services, I am often on the road speaking about Spark at conferences and meetups. At those events I am often asked the same question: “I am starting with Spark and notebooks; which language should I use: Java, Python, R, or Scala,?”

I always answer that the language really depends on the use case and that Scala is better suited for engineering work that involves large, reusable components while Python with its rich ecosystem is the language of choice for data scientists. However, for people who are starting out, the cost of entry for using notebooks can be steep. Simple tasks such as creating a graph or saving a Spark DataFrame on your laptop or somewhere on the cloud require writing complicated code.

Along with a few other motivated developer advocates from my team, I created an open source library called PixieDust to address these issues and fill other feature gaps. The current version focuses on Python only, but I hope to extend it to Scala soon. You can find a detailed introduction to PixieDust in my recent blog post, PixieDust: Magic for Your Python Notebook.

I’ll be covering PixieDust in more detail at IBM Insight at World of Watson 2016 with a lightning talk and a session. I’ll also be running a booth that shows how I used the library to create an application that runs entirely on a Jupyter notebook. It uses Spark ML, the FlightStats global flight tracker and The Weather Company Data for IBM Bluemix to predict whether a flight will be delayed because of weather.  I will also be co-hosting a Python Meetup with Hillary Mason on Monday, October 24th from 5:00 PM to 6:30 PM.

Join me at IBM Insight at World of Watson 2016, in Las Vegas, Nevada, if you want to know how PixieDust helps you take Jupyter Notebooks to the next level. For example, you can learn more about the ability to create fully functional Spark applications running in notebooks with compelling UIs and workflow. I hope to see you there.

Learn about PixieDust at IBM Insight at World of Watson 2016