Blogs

Post a Comment

Data Scientists: Myths and Mathemagical Superpowers

June 29, 2012

Mythbusting season is upon us. In the spirit of magnanimous debunkitude, I'll line up the top 10 myths that many people seem to believe about data scientists. Just watch me as I shoot them down one by one (the myths, not the people or scientists) like so many (empty) bottles of beer on the wall.

The leading myths of the data scientist are as follows:

Myth #1: Data scientists are themselves mythical beings, like the unicorns.

Data scientists are in fact very real. They have been in existence for as long as humans have performed multivariate statistical analysis, time-series analysis, and other core approaches. And to the extent that you build statistical models and use various analytics tools to find non-obvious patterns in data, you yourself may be a data scientist. You exist, don't you?  

Myth #2: Data scientists are just an elite bunch of precious eggheads.

Actually, data scientists are working stiffs...of the white-collar brainiac variety. Every day they get their clean fingernails virtually dirty by moving piles of raw data from all over, dumping it into analytical sandboxes, cleansing and sifting through it, and searching for useful patterns that may or may exist. Then, in very short order, they pour fresh data into the sandbox and do it all over again. It's often mind-numbingly detailed grunt work, not the sport of armchair data philosophers.

Myth #3: Data scientists are some sort of nouveau fad that will soon fade.

The catch-all term "data scientist" has been around for years and the various advanced analytics specialties that fall under it – statistical analysis, data miner, predictive modeler, and others – are even older. In the past few years, analytics professionals have increasingly used the term "data scientist" to refer to the convergence of these heretofore distinct disciplines with newer roles – such as behavioral analysis, sentiment analysis, and graph analysis – that have become super-hot in the era of digital channels and social media. Also, steady growth in data scientist job listings, professional forums, and academic curricula in the past several years is undeniable. Hiring trends bear this out. This is no fad.

Myth #4: Data scientists are all Ph.D statisticians and/or particle physicists who failed to make tenure.

For sure, many data scientists acquired their quantitative skills and learned their first statistical modeling tools in college. But many used them to pursue degrees in substantive disciplines such as business administration, economics, finance, and engineering. Many of the data scientists you'll encounter in the working world are in fact business domain specialists, not math-addicted "quants" or algorithm-fetishist "rocket scientists." They actually know a thing or two about the business problems they're modeling statistically.

Myth #5: Data scientists are just longtime business intelligence (BI) specialists whose employers gave them a fancier title in lieu of a pay raise.

Of course, many longtime BI power users are in fact data scientists of a sort. They are business domain specialists whose jobs involve multivariate analysis, forecasting, what-if modeling, and simulation. Those who wish to go even deeper into segmentation, decision-tree analysis, propensity modeling, predictive analysis, and other data science techniques have already begun to rebrand themselves accordingly. But most of the old school BI specialists – whose core functions are firmly rooted into historical, descriptive analysis – aren't pretending to be data scientists. However, many know their career development may stall out if they don't stay up to speed on new data-science-relevant topics such as Hadoop, predictive modeling, and graph analysis.

Myth #6: Data scientists aren't scientists in any meaningful sense of the word.

Well, every true scientist must also be a type of data scientist, although not all self-proclaimed data scientists are in fact true scientists. True science is nothing without observational data. Without a fine-grained ability to sift, sort, structure, categorize, analyze, and present data, the scientist can’t bring coherence to their inquiry into the factual substrate of reality. Just as critical, a scientist who hasn’t drilled down into the heart of their data can’t effectively present or defend their findings. Statistical controls are the bedrock of true science, and they are the core responsibility of the data scientist. Likewise, experimental controls are a hallmark of many scientific disciplines. If a data scientist is pursuing knowledge – such as on people's buying behaviors – and if they're confirming their findings through statistical controls and real-world experiments, they're a scientist, plain and simple.

Myth #7: Data scientists need fancy, expensive, frighteningly complex statistical power tools to get their work done.

This is categorically untrue. Fundamentally, the job of the data scientists is to look for hidden patterns. They can accomplish this through user-friendly advanced visualization tools, through self-service search-driven BI tools, through interactive data exploration tools, and other approaches that don't require a deep mastery of statistical analysis. The market for cost-effective exploratory BI tools has many vendors, including IBM Cognos. And the power-user business analysts of the world can find extraordinary insights from modeling features embedded in an ordinary spreadsheet application.

Myth #8: Data scientists simply pour gobs of data into Hadoop clusters, slap a little Pig and MapReduce on the problem, and then--voila!—mind-blowing insights spontaneously spew forth.

Oh, brother! The data scientist will be the first to tell you that Hadoop is just another platform for deep exploration. In this capacity, Hadoop is not different in kind from enterprise data warehouses, traditional data mining platforms, and other platforms for in-database analytics. None of these is a magic Ouija board through which the big data spirits speak to us mere mortals. Hadoop and other data science platforms are simply the analytical workbenches upon which data scientists conduct investigations into deep data.

Myth #9: Data scientists are just analytics junkies who mainline big data all day and couldn't care less about business applications.

No way. If you spend time with any real-world data scientist, they'll bend your ear discussing how they tackled a specific business problem, such as reducing customer churn, targeting offers across channels, and mitigating financial risks. Generally, they will only discuss the underlying data, models, or algorithms they used to work these problems if you specifically ask about them. And no self-respecting data scientist will claim must have big data – in the petabyte, real-time, multi-structured sense – to extract every valuable insight. Most data scientists are not nerds. They know most business people regard all this Big Data lingo as confusing jargon. The only people who wax poetic about the data are those who don't have to sweat the details of drilling it for value.

Myth #10: Data scientists don't have any operational responsibilities that require them to come down from their ivory towers.

That used to be the case. However, as next best action and real-world experiments become ubiquitous in modern business, the data scientist is evolving into the role that stokes, tweaks, and fuels this operational engine. As I've said elsewhere, data scientists experiment continuously by deploying new predictive models, business rules, and orchestration logic into next-best-action-powered applications. They test and tweak of the analytic-centric models at the heart of 24x7 digital engagement channels and agile business processes.

These prevailing myths obscure the fundamental reality: data scientists are at the heart of the big data economy. Yes, they often have superpowers, but not in a mythical sense. Their prowess comes from high-performance tools, platforms, practices, and teams.

The superpowerful data scientist is a transformative figure in modern business. Anjul Bhambhri, vice president of big data products at IBM, put it best. The data scientist, she said, is “part analyst, part artist." The people who excel in this role are those who are "inquisitive, who can stare at data and spot trends. It's almost like a Renaissance individual who really wants to learn and bring change to an organization."

 

Read some of James' other posts on data scientists