Skills that every data scientist needs in today’s insight economy

Big Data Evangelist, IBM

Smart people are everywhere. They can generate a never-ending stream of fresh insights when conditions are right.

In the Insight Economy, the conditions for ongoing data-driven discovery are here. Smart people tap into big data analytics, machine learning and statistical modeling to uncover emerging trends and see the future more clearly. The key role attracting the best and brightest is that of the data scientist.

What is a data scientist?

Put simply, these professionals are the core developers in the big data era. The analytic algorithms, predictive models and other artifacts they build are powering everything from e-commerce recommendation engines to the mobile experiences that drive modern life.

How do data scientists accomplish their practical magic? For starters, they must be resourceful because so few of them are available. The shortage of qualified data scientists in the labor market is real. Salaries for top-notch data scientists continue to grow, as reported in this recent Bloomberg BusinessWeek article, reflecting the paucity of qualified individuals on the market. Given the skills deficit, it’s not clear how organizations will recruit and grow all the talented people necessary to support their growing needs in this area.

To some degree, this shortage has encouraged rampant job-title inflation around the title data scientist as people with varying degrees of competency in statistical analysis skills seek out more job opportunities, regardless of whether they’re truly qualified for them. The hefty salaries are an incentive for people to enter this hot new field—as well as the “sexiest job of the 21st century” hipness quotient. And we can’t overlook the sheer intellectual challenge of data science, which appeals to anybody who wants a career that exercises their mind on all levels.

Fresh blood continues to enter this field as data science undergoes democratization. In the process, the new breed of data scientists is transforming businesses everywhere. We can argue about the issue of whether a true data scientist must have academic credentials. But no one doubts that credentials mean little if you can’t actually do the work. You can call yourself a data scientist in good conscience only if you have mastered the methodology.

Overcoming the learning curve’s a significant—some might say scary—learning curve awaiting anybody who seriously wants to enter this field. Many people let their fear of math keep them from getting that degree, cracking open the books, glancing at the journals or paying close attention when data scientists are speaking. Data scientists must navigate a thicket of statistical algorithms and techniques. It’s not enough to have a passing familiarity with regression modeling, for example, because that’s not the only statistical approach in the data scientist's toolbox—and besides, there are several ways to regress variables, none of which is perfectly suited to every modeling scenario. Choosing the right modeling approach is often a creative exercise that demands expert human judgment.

You don’t need a Ph.D. in statistics to be a data scientist. What you do need are curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor and a skeptical nature. You must also be articulate, because no one will accept the validity of the patterns you surface if you can’t explain clearly how you built your model, what variables and data you used or what the results truly mean in the context either of some business problem or scientific endeavor.

Foundation of a data scientist's skills

What I’ve sketched are the aptitudes and skills I would look for if I were hiring a data scientist. An autodidact could conceivably fit all of these criteria. However, enterprise big data initiatives thrive if all data scientists have been trained and certified on a common curriculum with the following foundation:

  • Paradigms and practices: This involves data scientists acquiring a grounding in core concepts of data science, analytics and data management. Data scientists should easily grasp the data science lifecycle, know their typical roles and responsibilities in every phase and be able to work in teams and with business domain experts and stakeholders. Also, they should learn a standard approach for establishing, managing and operationalizing data science projects in the business.
  • Algorithms and modeling: Here are the areas with which data scientists must become familiar: linear algebra, basic statistics, linear and logistic regression, data mining, predictive modeling, cluster analysis, association rules, market-basket analysis, decision trees, time-series analysis, forecasting, machine learning, Bayesian and Monte Carlo Statistics, matrix operations, sampling, text analytics, summarization, classification, primary components analysis, experimental design and unsupervised learning-constrained optimization.
  • Tools and platforms: Data scientists should master a basic group of modeling, development and visualization tools used on your data science projects, as well as the platforms used for storage, execution, integration and governance of big data in your organization. Depending on your environment, and the extent to which data scientists work with both structured and unstructured data, this may involve some combination of data warehousing, Hadoop, stream computing, NoSQL and other platforms. It will probably also entail providing instruction in MapReduce, R and other new open-source development languages in addition to SPSS, SAS and any other established tools.
  • Applications and outcomes: A major imperative for data scientists is to learn the chief business applications of data science in your organization, as well as ways to work best with subject-matter experts. In many companies, data science focuses on marketing, customer service, next-best offer and other customer-centric applications. Often, these applications require that data scientists know how to leverage customer data acquired from structured survey tools, sentiment analysis software, social media monitoring tools and other sources. Plus, every data scientist must understand the key business outcomes—such as maximizing customer lifetime value—that should be the focus of their modeling initiatives.

Classroom instruction is important, but a curriculum that is completely devoted to reading books, taking tests and sitting through lectures is insufficient. Hands-on laboratory work is paramount for a truly well-rounded data scientist. Enterprises need to make sure their data scientists acquire certifications and degrees grounded in real-world experience with statistical models that address actual business issues.

You can get much of the requisite expertise if you plug into the right community of data scientists and use that resource to boost your own knowledge and skills. An active community takes many forms, but long-lasting ones tend to share common traits: purposeful, open, disciplined and caring. At the core of these traits is the challenge of communicating and sharing experiences.

Register for IBM Insight 2015 in Las Vegas, October 25–29, and connect with the data scientists who are driving the Insight Economy.