Distributing data science brainpower more equitably among the haves and have-nots

Big Data Evangelist, IBM

People enter the data science profession for many reasons. Chief among them may be an innate love of statistical analysis, whether or not it satisfies any socially redeeming purpose. Another reason may be because of skills they acquired while pursuing their education, and which they found more marketable than the majors (such as physics or economics) under which they gained them.

Everybody has to make a living. Data scientists, like anybody else, tend to gravitate to where the jobs are, especially those that fetch higher salaries, offer the resources needed to achieve their dreams and promise more rewarding career paths. For that reason, larger employers with well-established, amply funded big-data initiatives tend to have an advantage over smaller organizations when it comes to recruiting the best and brightest data scientists. For that reason, nonprofits, charities and small businesses tend not to have full time staff data scientists, even though they may benefit as much from data mining and predictive modeling as much as their Fortune 1000 counterparts.

That's not to say that data scientists are necessarily in it for the money. In fact, the growing range of public interest data science communities such as DataKind shows that some are glad to donate at least a part of their time for worthy pro bono causes. Here's a post I did recently on public-interest data science. And here's a great recent article by Claudia Perlich on initiatives that put data scientists in the service of nongovernmental organizations (NGOs) working on charitable and philanthropic initiatives.

Multicolored hands 487275271.jpg

To some degree, these initiatives rely on individual initiative and sacrifice—in other words, data scientists seeking out nonprofit causes and either donating their time pro bono or accepting salaries that are substantially below the prevailing market rate. In Perlich's article, she points to the efforts of some for-profit employers to help their staff data scientists find part-time pro bono causes for which to volunteer.

Well-meaning as these efforts may be, nothing can match the power of an open market in efficiently allocating resources among competing uses. Perlich acknowledges that fact: "[T]o be honest, [voluntarism and corporate good-citizenship initiatives] target only a tiny fraction of interesting problems, and collectively deploy nowhere near the full capacity of the data science community to do good…How can we start connecting socially-minded data experts to important data problems at scale?"

She proposes a market-based approach for addressing that requirement. She proposes a year-round virtual marketplace where data scientists can find NGOs whose needs are well-matched to their skills and availability.

My personal feeling is that, though well-intentioned, even this would be inadequate. That's because a virtual marketplace, however structured, would focus on more efficient allocation of a still limited, still expensive, highly skilled resource: professional data scientists. Though some of these professionals may choose to work for NGOs for peanuts, or nothing at all, most would still be reluctant to commit to permanent, full-time positions. That's because most data scientists, unless they are independently wealthy, would still need to eat.

In order to more equitably distribute data scientist expertise among the haves and have-nots, these skills, tools and platforms need to become more widely available at low or no cost. Clearly, open source statistical modeling tools, such as R, will be fundamental for budget-constrained NGOs. Likewise, self-service, user-friendly statistical modeling and visualization tools will prove essential for encouraging more people at NGOs to acquire data science skills on their own without needing to recruit external resources. And big-data cloud services, such as IBM Watson Analytics, for which a zero-cost "freemium" licensed version is available to one and all, will bring make the enabling tools and platforms within reach of every NGO's constrained budget.

Actually, all of these trends (open source, self-service, cloud and freemium licensing) are the result of market forces. I would encourage virtual marketplaces for pro bono data science talent. But I would primarily point NGOs to the market forces that are already out there to help them empower existing domain experts on their staffs to do low-cost data science in the public interest.

You can bootstrap your organization out of data science have-not territory if you point existing personnel toward tools for deepening their own skills.