Data Scientist: Strike a Balance Between Quantitative & Qualitative Exploration

Big Data Evangelist, IBM

Life is stubbornly qualitative on every level. But we wouldn’t be modern and scientific if we didn’t try to constantly reduce it to numbers that we can calculate, manipulate and extrapolate.

Even when we’re trying to parse the mess into particular entities and interactions that we can analyze scientifically, the sheer messiness of reality often endures. Even the top scientists face this problem all the time when they’re poking around the frontiers of their discipline.

Discovering meaningful patterns in a messy problem domain is what the best data scientists do exceptionally well. But it’s not without risk.

And often that risk stems from the data scientist’s excessive professional pride, or, rather, from confidence that their standard approach for exploring complex data sets is perfectly suited to every new problem they may encounter.

Hubris is an occupational hazard in all professions. For data scientists and subject-matter experts of all stripes, the chief risk is thinking you understand the problem space better than you actually do, and then building a statistical model that includes only the variables and data sufficient to confirm your thinking. After all, the universe is filled with spurious correlations, and you can prove anything with numbers. Data science is not true science if it’s just using fancypants statistical analysis in the service of Colbertian “truthiness.”

The more accomplished you are, the more you need to cultivate deep humility in order to stay sharp and agile in your chosen profession. No matter what field you’re talking about, the best experts are painfully aware of how limited their expertise truly is and are continually skeptical of their own thinking. The best data scientists are those who recognize the limitations and biases that their own personal approaches for exploring deep data sets may bring to their work.

Recently, I came across a great article that described two types of data scientists, “positivists” and “interpretivists,” each with distinct approaches for exploring complex problem domains. The article’s author, John Weathington, draws the dichotomy as “quantitative” vs. “qualitative” researchers, or “deductive” vs. “inductive” reasoning. He describes this as a philosophical divide between different camps of professional data scientists.

Balance_0.PNGIn the best of all possible worlds, mixed data-science methods—balancing quantitative and qualitative approaches—would be the way to go. Quantitative approaches are best when you have a strong hunch where the solution to a tricky statistical modeling problem might lie, whereas qualitative are best when you’re just groping in the dark. However, Weathington says it may be hard to achieve this balance in practice, because it usually involves trying to encourage operational collaboration among professionals with very different propensities and working styles.

That’s a valid point. But even if you could get the camps to collaborate harmoniously in operational data-science environments, the chips may be stacked against qualitative data scientists. Their inclination is to engage in open-ended data explorations, a propensity that may stoke the misconception that they are lazy and unfocused. To their bosses’ chagrin, they may not have any inclination to produce discrete business deliverables on a regular basis.

By contrast, quantitative data scientists are more likely to adopt a cut-and-dried working rhythm more consistent with traditional development practices. Quantitative data scientists would generally place priority on automating many of their core data preparation and regression duties, on iterating and refining a steady flow of fresh statistical models, and on feeding deliverables with clocklike regularity into production business applications. Even if their model-driven insights are not particularly insightful—compared to their qualitative counterparts—the quants are easier to measure, productivity-wise.

Business people are often more comfortable with the clockpuncher who produces mediocre, unoriginal work than with the supposed genius who takes forever to produce something concrete. It’s likely that the quantitative/positivist data scientists would be perceived as more reliable in a traditional operational setting, even if they contribute nothing noteworthy to business success.

Yes, there are brilliant data scientists of both orientations: positivist and interpretivist. But if the former gain the upper hand in your business data-science practice, you may be crowding out the brilliantly intuitive researchers that you truly need to find disruptive insights buried somewhere inside the sprawl you call “big data.”

Continue the discussion & check out these resources



Please engage us and let’s continue this exciting discussion.