Hack the Weather: Weather data presents new challenges for data scientists

Big Data Evangelist, IBM

Weather is serious business. It’s also a rich field of challenging opportunities for the new generation of business data scientists.

On the weekend of September 18–20, IBM and The Weather Company co-sponsored “Hack the Weather,” a hackathon at GalvanizeU in San Francisco. Before discussing the projects that were implemented and presented at “Hack the Weather,” we’ll explain how The Weather Company and IBM conducted the event, which began the afternoon of Friday, September 18. The Weather Company’s Matthew Porcelli and Polong Lin from IBM provided onsite guidance and instruction to attendees over the weekend.

Porcelli started the hackathon by stating the event’s ground rules. Specifically, he said all projects must be executed only with The Weather Company weather data packages and IBM data science tools, including a powerful workbench and Apache Spark as a Service, to create smart data applications for meteorological forecasting.

Porcelli went on to describe how The Weather Company and IBM are working together to address the weather data analytics needs of organizations in retail, government, insurance, energy and utilities, life sciences, and other vertical markets, followed by a quick overview of weather data analytics use cases in several of these industries. He discussed the broad coverage of The Weather Company’s diverse weather-data sets, as well as the high-level analytic architecture of the company’s 15-day forecasting engine. Porcelli then outlined how The Weather Company is incorporating new remote-sensing, the Internet of Things and other data sources into its weather data, and talked about The Weather Company’s principal weather-data analytics application programming interfaces (APIs): Sun, Datacloud, Weather Underground and Lightning Server. 

Finally, Porcelli entertained us with the analytics-driven story of a tornado that he chased out in the field. Evidently, weather data analytics can be a participant sport for data scientists who like to live dangerously.

IBM data scientist Polong Lin then discussed the details of the IBM Data Scientist Workbench that all participants were required to use in their projects. He also described the educational courses available online at no charge to anyone through

Participants engaged Porcelli and Lin in a technical Q&A to nail down the ground rules and resources applicable to this hackathon.

At last, the work began with dinner. Teams began to form over pizza, sodas and Red Bull. Porcelli asked the participants to declare whether they were “weather geeks” or “data science geeks.” From the responses, it was clear that most were in the latter camp. Porcelli, a genuine weather geek, was glad to stoke ideation by sharing his passion and knowledge on the technical aspects of statistical modeling with weather data. Throughout the weekend, Porcelli offered ample guidance on anything relevant to the weather data The Weather Company provided, as well as any technical issues relevant to modeling with it.

As with any hackathon, the proceedings were not terribly exciting to observe. Like most of these kinds of events, teams spent most of their time hunched over laptops, focusing intently on their projects. The proceedings resembled a highly caffeinated still life from Friday evening through Sunday morning.

At 1 PM Sunday, the teams presented their finished projects to the judges. Here are quick snapshots of their work:

  • Team 1 presented their work on modeling how weather affects retail store traffic and operations. Their specific focus was on growing revenues from customers who shop only during weather events. This involves identifying weather-driven triggers of traffic fluctuations and recommending interventions, such as alerts and emails to customers, based on weather events.
  • Team 2 modeled the impact of weather conditions on Amtrak travel delays in the northeastern United States. Their specific focus involved looking at how weather conditions in one city or on one route impacted delays in other cities and routes.
  • Team 3 modeled how weather conditions relate to traffic accidents. They found a database showing that 20 percent of total accidents are weather-related.
  • Team 4 modeled how weather affects urban pollution levels, which correlates to childhood asthma and applications in real estate house-hunting.
  • Team 5 comprised a Galvanize student who used this hackathon as a learning opportunity: use Spark to do modeling with weather data. The student discussed the data-science challenges encountered working with The Weather Company’s weather data and APIs, and with the IBM Data Science Workbench.

The semi-finalists will be selected and notified by Tuesday, September 22. Semi-finalists will have a chance to win a grand prize of $5,000 cash and the opportunity to present their final project to a panel of judges at IBM Insight 2015 in Las Vegas on Tuesday, October 27.

Come see the finals and learn how to generate weather-driven analyses at IBM Insight 2015. IBM and The Weather Company will show you how to use The Weather Company weather data packages and IBM data science tools—such as Apache Spark as a Service in the IBM Bluemix cloud—to address your most urgent weather-related business challenges. You can take advantage of tools, including a powerful workbench and Apache Spark as a Service, to build powerful new smart data applications for meteorological forecasting and hack your weather-related analytic challenges.

Plus, if you want to use Apache Spark as a Service to address analytics-related business tasks, explore this educational IBM Analytics resource page.

Finally, check out my recent blog on how smart data apps could usher in the next stage of meteorological forecasting.