Blogs

Post a Comment

Data Scientists: Credentialed or Otherwise

August 13, 2012

For the past 2 months, a LinkedIn discussion group has been debating the burning question "Do You Need a PhD to Analyze Big Data?" Always itching for fresh chat, yours truly has stepped into the fray with a humble opinion or two. And I got flamed in no uncertain words. In fact, one PhD who didn't actually respond to the substance of my arguments felt the need to label a certain IBM big data evangelist a "very scary person."

Whew! Interesting how sharp comments give some people the willies. Clearly, the discussion focused on a loaded question, and perhaps a rhetorical one as well. The question was loaded in the sense that it seems to have been designed to rouse some people to whip out serious ammunition. Some used the discussion to vehemently defend the doctorate holders of this world against any insinuation that they might not be ideal candidates for every data scientist position. It was rhetorical in the sense that, I believe, the person who started the discussion probably already knew that the correct answer is "no, but it's complicated."

Yes, indeed, it was a fine question on which to spark heated debate among brainy people, many of whom have a career stake in the matter. As for yours truly, the gist of my blood-curdling argument was that PhD-itude, in and of itself, does not make you an expert data scientist. It also doesn't necessarily qualify you to sift through elephantine data sets for gems of breathtaking intelligence. Though you may have gained your degree after extensive data-science-based research, you weren't necessarily plowing through petabytes (though it may have felt that way).

The general opinion on the discussion group seems to be twofold, and it's consistent with my own perspective. First, an advanced degree in some relevant discipline can be a positive thing, but no specific credential is perfectly suited to all data scientist scenarios. And second, scholastic credentials might themselves be unnecessary if you have the core aptitudes–curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature–that distinguish the best data scientists. Some people are simply brilliant auto-didacts.

What truly ticked off some LinkedIn discussants was when I pointed out that, in some circumstances, going to the trouble of getting a PhD might limit your downstream effectiveness as a data scientist. Here, specifically, is the bombshell I posted:

  • "Of course you don't necessarily require a PhD to analyze big data. In fact, a PhD may be counterproductive. A PhD often indicates you've spent several years acquiring a substantial mastery over an extremely narrow topic and perhaps an extremely specialized set of analytical/statistical methodologies and tools. You've trained yourself to such a niche-y subject matter expert you are at risk of losing the flexibility, curiosity and collaborative aptitude to do the sort of cross-disciplinary data science that the most innovative big data experts need to do in today's free-wheeling business world. Besides, whatever subject matter and tool mastery you gained when you got your doctorate may be obsolete or unmarketable 5 years from now in the fast-evolving big data marketplace. On the other hand, uncredentialed Big Data auto-didacts often have the flexibility, curiosity and collaborative chops in spades, and may be more valuable players in the new world of data science."

I don't see anything terribly controversial in any of this. It echoes what many other discussants have posted. And, in fact, in a follow-up post I pointed out that I've heard many PhDs over the years express exactly these sorts of concerns.

PhDs are brilliant people, no doubt. I certainly wasn't arguing that they can't be world-class data scientists (for example, IBM employs many of them and they are the backbone of our world-class research labs). The best keep their curiosity, flexibility and collaborative nature intact through the long, lonely, isolated, hyper-specialized years of researching their dissertations.

But, in real-world data scientist positions, smart people–credentialed or otherwise– still need to keep growing with the times. Regardless of whether they've done post-doctoral work or are totally self-taught, they will still need to evolve their skills and competencies to keep pace with the fast-moving world of business-oriented data science.

When hiring data scientists, you should regard advanced degrees as valuable, but also as something that might conceal warning signs about a person's suitability within your business context. What should you look for in a data-scientist job candidate? And what are the signs that somebody with a fancy degree may be wrong for your organization?

The warning signs are numerous. Among other things, does the candidate act as if they have all the answers? Excessive pride does not make for an effective data scientist. Their job is to ask great questions, and not to presume that they know the answers until they've tested their statistical models against fresh data. Just as important, they will need to be rigorously self-critical on the job, continually questioning the assumptions that shape their models so that they can iterate rapidly through alternate versions to get closer to the truth of whatever they're exploring.

Also, does the candidate seem unwilling to learn whatever statistical modeling tools your organization has standardized on? Does it seem as if they intend to stick stubbornly to whatever data mining, predictive modeling, or other tool they mastered in the halls of academe? That's a bad omen. If they get the job, they will be producing and documenting models that other data scientists on your team will need to understand, maintain and evolve. Private pet tools are no-no when you're talking about reusable business assets. Artisanal arrogance of the "I'll use my own, thanks" variety is out of the question.

And are they unwilling to evolve beyond the strict specialization of the data science focus upon which they built their doctoral research? If not, why don't they simply go back to the university and see if they can get tenure in the department that spawned  them? They must recognize that real-world business problems sprawl across all disciplines, demanding the sort of generalist outlook that graduate school may have beaten out of them.

OK, PhDs, flame me to a cinder, if you wish.