Go global with data science at Datapalooza

Founder, Data-Mania, LLC

Over at Data-Mania, I’ve started a blog series on analytics as a service, specifically focusing on IBM’s Watson Analytics technology. I originally planned this series because I wanted to quash some recurring worries I’ve heard among the data science community. Accordingly, I’ll be demonstrating that IBM’s Watson Analytics technology was unequivocally not designed to annihilate the careers of data science professionals. What’s more, I’ll be underscoring just how much IBM is doing to foster, cultivate and enrich the careers of working professionals in the big data and data science space.

The Datapalooza event is one major avenue by which IBM is supporting the data science community, seeking to educate the “next generation of data scientists on how to apply their minds, creativity, and tools in the creation of innovative data products.” That sounds pretty cool, right? But what does it look like in practice?

It looks even cooler in practice than it does on paper. Think of Datapalooza as a three-day rock festival for data lovers that offers absolute immersion in the wild world of big data and data science. Participants come to learn, play and create data products, all from within the event. IBM even throws in an evening rock concert to provide a break from all the data science and big data excitement.

But Datapalooza is like a rock festival in more ways than one. Like a band, Datapalooza goes on tour—and data science lovers who are passionate enough to follow the event will visit some of the most cutting-edge cities on Earth, like Tokyo, London, and Berlin. To kick things off, however, Datapalooza is making the rounds of some of the edgiest cities in America.

The inaugural Datapalooza event was held in San Francisco, on 10–12 November, 2015, at Galvanize University. Then Datapalooza moved to the Galvanize campus in Seattle, Washington, on 9–11 February, 2016. The third Datapalooza event took place at the Austin, Texas, Galvanize campus on 28 April; the fourth is scheduled for 19 May in Denver.

Welcome to Galvanize!

All this talk about Galvanize reminds me: You’re familiar with Galvanize, right? If not, you certainly should be—it’s a university that specializes in quickly teaching hands-on technical skills, touting itself as a place where attendees can “become a developer, data scientist, or . . . build your tech startup.” Galvanize operates in eight locations across the United States, spread between the states of California, Washington, Colorado, Texas and Arizona. Its programs last from 12 weeks to 24 weeks, supplemented by short-term workshops that allow attendees to learn how to program in person. But I digress—now back to Datapalooza!

Get the lowdown on Datapalooza San Francisco

Here’s a quick play-by-play on how things went in San Francisco last November. Now, Datapalooza is set up so that its sessions run several times per event—you normally won’t have to miss one session to make it to another. Indeed, with the right schedule, you can generally find a way to catch everything that strikes your fancy.

Better still, Datapalooza sessions are categorized by difficulty, with some sessions designated for beginners and others for intermediate or even advanced data scientists. You won’t have to be bored to death because a session is too basic—nor because it’s too technical. IBM committee members spend a lot of time thinking about who would attend these events, then selected offerings design to offer something to everyone—and that’s part of what makes Datapalooza so awesome.

In its data engineering track, Datapalooza offered microcourses that helped data engineering attendees build a base of foundational knowledge around data science topics such as data variables, models and scoring methods. A wide range of courses focused on hot topics such as recommendation algorithms, machine learning capabilities, full text search and geospatial search.

The data science track, by contrast, offered microcourses designed to help data science attendees acquire a suite of data engineering skills in the areas of data wrangling, data munging and data pipelines. Course topics included Twitter data analysis using Apache Spark and IBM Watson, building Word2vec models, natural language processing and more. To round out its offerings, IBM provided microcourses in Spark, web scraping, data pipeline design and more.

But the Datapalooza program also featured a chance for data app development, with attendees building data apps right on site! Indeed, this is where everything comes together at Datapalooza. Although IBM certainly aims to offer attendees the chance to learn and be inspired, Datapalooza seeks to get the “next generation of data scientists . . . to apply their minds, creativity, and tools in the creation of innovative data products”—and app development is the perfect way of doing exactly that.

To keep things from getting too heavy, IBM invited attendees to a free concert at the Mezzanine to hear a band appropriately named Big Data. Judging by the social media output from attendees, they had quite the evening.

Big Data in concert at Datapalooza San Francisco (Photo credit:

Introducing Spark Technology Center

Spark Technology Center (STC), was the official sponsor of Datapalooza San Francisco. STC serves as a community hub showcasing all the groundbreaking advances that data scientists are making using Spark. Established by IBM last year, STC has already forged partnerships with organizations such as DataBricks and AMPLab. Accordingly, IBM brought in some staff members of the Spark Technology Center to demonstrate data applications and to host some of Datapalooza’s microcourses.

At Datapalooza, STC staff held an exciting demonstration of their Red Rock product.

The Red Rock application sifts through mountains of Twitter data in search of patterns that can help identify influential Twitter users, tweets and conversations. It even supplies geographic information supplementing its findings.

Red Rock relies on Apache’s Spark and MLlib technologies—the former an analytical platform for processing and generating insights from real-time Twitter data and the latter a machine learning library within Apache Spark.

Stay tuned for these important announcements

Of the many important announcements made at Datapalooza, some undeniably stand out—for example, the partnership between IBM and DataKind in support of DataKind’s MicroCred project.
DataKind is a nonprofit organization of data scientists from all over the world who work together as volunteers in hopes of solving some of the developing world’s most pressing problems. The MicroCred project aims to use predictive modeling to enhance loan servicing for microloans to small businesses in the developing world.

But even such a momentous announcement couldn’t distract from equally exciting ones—among them that IBM is supporting Galvanize’s Women in Data Science and Engineering Program, offering $150,000 in scholarships to help women enroll in Galvanize’s Data Science Training Program—enough for at least 10 winners. If you or someone you know would be a good candidate for this award, apply now.

Coming to a city near you

As Datapalooza travels the world, keep watching to see how it affects each region in turn, igniting data science excitement in its host cities one after another. Register now to attend the next Datapalooza, scheduled for 19 May in Denver. If Denver isn’t near your stomping grounds, then find out when Datapalooza will be coming to a city near you.