Data Scientist: Closing the Talent Gap

Big Data Evangelist, IBM

Nobody doubts that companies everywhere will continue to ramp up their hiring, recruitment and training of data scientists. But there seems to be a growing alarm that we won’t have enough data scientists to go around.

Will the big data revolution screech to a halt due to a shortage of data scientists? Don’t worry. I think those concerns are misplaced. Just as the longstanding alarm about shortages of computer scientists has proven unfounded, the data scientists of the future will emerge organically from the world’s increasingly social-centric, open-source-oriented big data analytics ecosystem.

To a degree, I have corroboration from analyst Neil Raden, who last year published an excellent blog, “The Fallacy of the Data Scientist Shortage.” He noted several important trends that make it highly unlikely that we’ll see a data scientist deficit any time soon.

First, today’s data scientists spend the majority of their time doing data discovery, acquisition and preparation. This means that, as more of these functions are automated through better tools such as IBM InfoSphere Server, today’s data scientists will have more time for the core of their jobs: statistical analysis, modeling and interaction exploration. And even those core functions will be automated to a greater degree through productivity features built into modeling tools such as IBM SPSS Modeler.

Second, data scientists are developing fewer models from scratch. That’s because more and more big data projects run on application-embedded analytic models integrated into commercial solutions, such as IBM Unica.

Third, data scientists will increasingly be sourced, as needed, from external professional services firms, such as IBM’s Business Analytics and Optimization group, or will be developed in-house. Raden specifies four skills levels – from most advanced (“true data scientist”) to least (“business intelligence/discovery”) – that will fill various data scientist roles in the public and private sectors.

I’d like to note a fourth trend that will continue to close the talent gap in enterprise data science. As I noted in a recent blog, more organizations are establishing data science centers of excellence. With these ongoing programs, organizations are fostering standardization, reuse, collaboration, governance and automation within and across data science initiatives. Centers of excellence leverage and extend established analytics best practices. They provide a convergence point for statistical analysts and subject-matter experts looking to share expertise. They also provide forums and resources for long-time BI and data management professionals to enhance their skills in hot new areas such as text mining, sentiment analysis, social network analysis, behavioral analytics and ensemble modeling. They will almost certainly evolve into a lifelong learning program for business analysts to acquire full-blown data-science skills. In the process, they will make a significant dent in the talent gap that hamstrings many of today’s business big-data initiatives.

And there’s yet another trend that will alleviate any talent gap: the democratization of data science. While I agree wholeheartedly with Raden’s statement that “the crème-de-la-crème of data scientists will fill roles in academia, technology vendors, Wall Street, research and government,” I think he’s understating the extent to which autodidacts – the self-taught, uncredentialed, data-passionate people – will come to play a significant role in many organizations’ data science initiatives.

As I’ve noted elsewhere, academic credentials are important but not necessary for high-quality data science. The core aptitudes – curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature – that distinguish the best data scientists are widely distributed throughout the population.

We’re likely to see more uncredentialed, inexperienced individuals try their hands at data science, bootstrapping their skills on the open-source ecosystem and using the diversity of modeling tools available. Just as data-science platforms and tools are proliferating through the magic of open source, big data’s data-scientist pool will as well. Already, we see crowdsourcing work its magic through business models such as Kaggle, which pool the world’s data-science expertise in wide-ranging development, investigation and exploration of analytics- and data-infused business problems. Indeed, open-source communities are where much of the fresh action in data science is happening. You should tap into this resource, find these people, and bring them into your team – if they don’t reach out to you first.

Yes, data science is a highly skilled profession with a significant learning curve. But throngs of smart people are flocking to it in ever greater numbers to advance their careers in the age of big data. So any worries of a chronic talent gap are misplaced. What you should worry about is how best to find, recruit and train the right data scientists for your specific big-data initiatives.

What's your opinion? Are you looking to hire more data scientists this year? Are you concerned about a shortage? Leave us your thoughts in the comments section.

Visit our Data Scientist topic area for more blogs, videos and other resources