The Best Data Scientists Cluster Around the Biggest Big-Data Challenges

Big Data Evangelist, IBM

Few people become data scientists to get rich. If that’s their overriding goal, they tend to gravitate toward enterpreneurship and, possibly, partnerships with well-established rockstar data scientists (who, at that stage in their careers, probably do expect to get filthy rich beyond their wildest dreams).

People become data scientists for many reasons, and intellectual stimulation is high on the list. Considering how many of them have PhDs and went to leading universities, what you often see are professionals who could have just as easily stayed in academia (assuming that they could have found suitable employment opportunities there).

What-is-a-data-scientist.jpgSmart people generally cluster around leading-edge challenges. The world is full of data scientists who are either just starting out in their careers or are between jobs. The best way to attract them to your projects is to give them something really exciting to wrap their minds around. If you’re only offering them your most boring projects—in other words, the ones that don’t allow them to stretch their minds and spawn insanely disruptive new solutions—don’t expect them to return your calls.

Okay, so you have something seriously cool to lure the best and brightest to your big-data initiative. So how do you loop them into what you’re contemplating? You might try crowdsourcing.

Already, we see data-science crowdsourcing work its magic through business models such as Kaggle and TopCoder, which pool the world’s data-science expertise in wide-ranging development, investigation and exploration of analytics- and data-infused business problems. Indeed, open-source communities are where much of the fresh action in data science is happening, and where many newbies—lacking credentials and experience—are using the competitive challenge to ramp their skills to the next level.

Crowdsourcing data scientist expertise on a moment’s notice is often as easy as engaging the smart people in online communities and, if budgets permit, hiring them for consulting projects. Quite often, the best freelance data scientists already maintain a prominent online presence, promote their best work far and wide, and collaborate on a wide range of challenging projects (not all of which pay the bills). They simply crave engagement with their peers and a chance to gain greater recognition for their accomplishments.

I recent came across an excellent article on how to find and keep the best data scientists in your organization’s big-data practice. The authors discussed their company’s successes in retaining the best and brightest through a variety of approaches that center on a single principle: fostering a collegial environment for ongoing research, collaboration and creativity. Key elements of this approach include:

  • Instituting a scientific advisory board that involved invited experts from academia;
  • Conducting regular data-science competitions where teams can win awards for tackling tough analytic challenges;
  • Putting job candidates through rigorous interview processes where they must defend their research theses;
  • Offering advanced training courses and opportunities for self-improvement;
  • Allowing data scientists to participate in professional conferences;
  • Encouraging data scientists to publish in external journals and other channels;
  • Organizing regular gatherings that encourage data scientists to talk, present their work, and learn from each other
  • Letting data scientists pursue their curiosities and research agendas with minimal interference
  • Offering data scientists the opportunity to collaborate in a steady stream of new, challenging projects in which they can develop new skills and experiment with new approaches

Crowdsourcing can fit organically into this approach if you allow your data scientists to engage with anybody anywhere who stimulates their thinking and, possibly, contributes to their learning and research. The crowdsourced experts may in fact be looking for connections into a professional data-science operation such as yours, so that they can explore career opportunities. Even if they aren’t looking for a job, they may simply want to engage with your best and brightest because they’re starving for peer recognition and a like mind to bounce ideas off.

Some data scientists gnash their teeth over the phenomenon of crowdsourcing, saying it dilutes the field with shallow, unproven talent. Quite often, it’s these same professional data scientists who bemoan the fact that the common denominator skillset for their field seems to be trending downward.

I think these trends—data-scientist crowdsourcing and skillset dilution—are inevitable but, net-net, all for the best. As data science continues its push into all aspects of our lives, it’s best not to set the criteria for data scientists so high that it excludes the many impassioned “amateurs” from contributing to the best of their abilities. The business world needs data scientists of all stripes: seasoned, credentialed professionals, the self-taught, and all shades in between.

Let’s put it this way: to claim that you’re a data scientist is a bit like identifying yourself as a writer. You’re essentially saying you’re capable of engaging in the activity at some level of competency, but are not necessarily good enough to land on bestseller lists or win Pulitzer Prizes. But, if you produce extraordinary work that solves leading-edge data-science problems in business, science, or other fields, you should be recognized, honored and compensated, regardless of whether the old guard recognizes you as a “true” data scientist.

Continue the discussion & check out these resources



Word cloud image: justgrimes