Blogs

Why data science is a team sport

Solution CTO, IBM

Analytics talent may exist in your organization, but is it the right analytics talent for the problem you are trying to solve? One of the most frequent requests I get is to write more on best practices and lessons learned. I had a chance recently to work with a super smart and creative client team that led to this blog post. The core idea is a pretty simple one. In the same way a fit-for-purpose architecture is deployed, consider using fit-for-purpose people too. 

The idea behind fit-for-purpose architectures is that no one size technology fits all use cases. Instead, the use case needs to drive the technology selection. Ideally, a variety of ways to get a workload done are available, so select the right DNA of the solution to match the DNA of the problem. The ability to store data may be common across the IBM DB2 database, IBM BigInsights streaming analytics and the Cloudant distributed database as a service (DBaaS) technologies, but the way they work and think about data is vastly different.

http://www.ibmbigdatahub.com/sites/default/files/analytictalent_blog.jpgThe same is true about people, and this underappreciated point really jumped out at me during conversations with this particular customer. We were talking about partnering on solving some pretty gnarly optimization problems that are extremely computationally expensive because of how many degrees of freedom the solution has. As part of the conversation, I got to meet some of the team’s math, operational research data science and PhD staff. We’re awfully lucky for the chance to work with this kind of client organization. But as the conversation moved from the operations research optimization part of the problem to other business problems, people involved in the conversation began losing interest.

After all, the team of operations data science professionals could do the same math in their sleep that the other teams needed help with, but the team just wasn’t interested in doing the work. This situation is surprisingly common. People tend to have roles and responsibilities that normally keep them fully busy, which is worth pointing out because the C-level executives often have a very different understanding of this dynamic. C-level executives commonly say they have the skills needed to do the job, so why does the work not get done? I’ve posted commentary previously about common capacity constraints, including what I call the not-enough-hours-in-the-day problem, which remains the key problem, but there’s more to it than just time. 

As is the case with fit-for-purpose architectures, the DNA of the problem needs to be matched with the DNA of the people being asking to do the data science work. For the aforementioned customer, while some of their operations people could solve the marketing problems quickly, clearly they were not the right fit for the operational problem. 

Why? Well, for starters, the operational research team didn’t think like marketers. Its members prioritize their space as being worth their time, and they were not excited by the work to address optimization. The right fit is not just the sheer capability of a person or the team’s skill set; success with advanced analytics and data science requires important planning to ensure the right people fit the problem you’re trying to solve. 

Data science is after all a team sport, and it’s one of the reasons why service delivery teams use a fractional resource model in their data science. Don’t assume that similar-looking data science and math skills you may have access to can be readily plugged into any problem the team is qualified to answer, even if the answer is technically that they probably are.

Operations people are going to want to do operations-related things, and marketing people are going to want to do marketing-related things. Does this reality mean you can’t cross-train people or that you are backing off center of excellence (COE) best practices for rotating people into data science teams? No, but it does mean that you probably need to do a bit more thinking on the softer side of approaching these math problems.

Data scientists and other analytics professionals are always hungry for power tools to expand their productivity and unlock fresh insights. Apache Spark is an emerging new tool that builds on your investments in Hadoop, predictive analysis, stream computing, and other power tools. Hungry for more information on Spark? Get started learning more about Spark today, and register for Spark Summit in San Francisco, California, June 15-17, 2015. For further details, check out IBM BigInsights 4.0,, an enhanced solution with Spark support.