Collaborations and correlations in the common cause

Big Data Evangelist, IBM

What makes the world a better place? If any of us feels that we have the last word on that topic, we're either some great religious figure or insufferably self-important.

Many people consider wealth, capitalism, democracy and civil liberties to be among the central pillars of the good life, though not the sum total. So, if you're of the mind that open markets, robust commerce and a vibrant media culture are central to our well-being, you might take umbrage at the following, much-quoted crack by a Silicon Valley player: "the best minds of my generation are thinking about how to make people click ads. That sucks."

hadoop challenge calling all developers.JPGThat's really unfair. Say what you will about specific ads, the ad industry, the companies and industries that depend on ad revenues and advertising as a pervasive presence in our lives, but there's no denying that advertising is a legitimate livelihood for many of the most creative people in our society. But, clearly, there are plenty of other things that artists, advertising professionals and data scientists could be doing with their lives. And not all of those things entail the possibility of them growing filthy rich. Some of the most socially beneficial lines of work might require that they take the proverbial vow of poverty.

You don't need to be a bomb-throwing radical to realize that there are many problems in this world that can't and won't be addressed by for-profit commercial enterprises. Among the various ways by which society deals with these, we have government agencies, philanthropic foundations, religion-sponsored (faith-based) endeavors, cause-specific charities and so on. And they continue to employ many of the best minds (and souls) of this or any other generation. They may in fact employ statistical analysts and other data scientists on occasion on various projects. However, the shoestring budgets of many non-profits tend to constrain their ability to engage specialized technical resources.

I'm impressed with initiatives in the U.S. data scientist community to volunteer their time to worthy causes at home and abroad. Clearly, most of the data scientists who participate in communities such as New York-based DataKind have day jobs to pay the bills. But they see larger humanitarian causes (reuniting refugees, curing infectious diseases, feeding hungry populations and guaranteeing civil rights to the disenfranchised, for example) that can benefit from the smartest data scientists applying their best efforts and most sophisticated tools to the task.

Data scientists' contributions can illuminate real-world variables that aggravate the woes of impacted populations. Statistical correlations, data mining, advanced visualization, predictive models and other analytical tools can reveal the root causes of problems that may not be readily apparent to the people suffering them or to the institutions that have tried and failed to address them head-on. Data scientists' insights—developed in close collaboration with subject-matter experts—can provide the decision support needed by agencies, community groups and others who are in a position to fix the problems.

Though I laud what DataKind is doing, as discussed in this article from last year, I'm not sure that this business model is sustainable. Founder Jake Porway describes DataKind as "a global network of data scientists rushing in from around the world, any time they are needed for some humanitarian cause or crisis." That's a noble sentiment, but it implies that data scientists' insights are exactly what's needed in real-time emergency-response scenarios. However, I doubt that parachuting a bunch of quantitative analysts (or quants) into a disaster area will do as much good as airlifting bulk shipments of food, water and medicine.

Indeed, what data scientists are best at is showing the correlations associated with chronic, systemic patterns (historical, behavioral, social, geospatial and so on) that might be contributing to issues on the ground. Data scientists work best when they engage long-term with the groups who are addressing these issues. In other words, a "do-gooder" data scientist needs to make it his or her day job in order to have a truly significant impact. Voluntarism only goes so far in these efforts.

To sustain the engagement of the data-science community in these common causes, what's needed is for people and institutions to open-source all of their decision-support assets: data, analytics, tools, platforms and, of course, expertise. I discussed the openness requirements for a smarter (and more humane) planet in this blog.

Already, open data is playing a big role in many global initiatives for improving the lives of whole populations. In this article, I discussed the growing role of open government data in helping civic watchdogs fight against corruption in many countries. In this piece, I reported on a McKinsey study that highlights the macroeconomic benefits of open-data initiatives around the world. This one examines the role of open data in consumer-protection initiatives. And in this one, I speculated on the potential for open crowdsourced environmental data to improve urban quality-of-life.

It's good for data scientists to open their hearts to the wider world of causes that might benefit from their talents. It's also good that they're opening their schedules to do more pro bono work. But it's even more critical that they convince whoever pays their day-job salary to open their full resources for humanity's sake.

If you're a Hadoop developer who would like to donate your data science expertise for worthy causes (and to win cash prizes), IBM welcomes your participation in the new, worldwide Big Data for Social Good Challenge. In this challenge, participants will build applications that drive social good using IBM's Hadoop platform on the Bluemix cloud service. We'll get you started with the platform, data and civic problems in need of solutions—all you need to bring are your magnificent data science skills, imagination and passion for changing the world. The challenge is open to developers everywhere on the planet; sign up today to prepare for the opening for submissions (on November 10).

Learn more in the video below, and we look forward to engaging with you in solving problems to make this a better, smarter planet.

Participate in the #Hadoop4good challenge and change the world