Going beyond data science toward an analytics ecosystem: Part 1
Mount a strategic, practical, and sustainable approach for big data analytics
Big data analytics has emerged as a significant tool for business, government, and society to leverage the oceans of available data by using very powerful, relatively cost-effective analytics technology. Along with the challenging economic climate and accelerating pace of change, many organizations hope to use advanced analytics tools and data to surf these relentless waves of change. A rising number of organizations have therefore already begun to formulate their big data strategy and build their capabilities. The ability to achieve the full potential of big data analytics requires not just data, tools, and infrastructure, but also quantitative skills to traverse the huge mountains of data. One key challenge organizations face is recruiting people with the skills to navigate all this data. Skills shortage in big data analytics is significant and is predicted to escalate. Some estimate the shortage to be in the hundreds of thousands in the US alone.1 In particular, many organizations are unable to fill the data scientist role that they have deemed so critical for big data analytics. Data scientists have contributed immensely to the development of the big data analytics phenomenon. However, as organizations increasingly embark on this journey, a more strategic, cost-effective, and sustainable approach than current attitudes is needed. The approach outlined in this three-part series recognizes that a successful initiative depends on more than a single talent. It requires many roles—business and technical, internal, and external—all working collaboratively within a common vision, culture, and architecture. In short, it requires an analytics ecosystem.
The emergence of the data scientist
The data scientist role has personified the big data analytics phenomenon and captured our imagination. Jeff Hammerbacher and D.J. Patil, who at that time were at Facebook and LinkedIn, respectively, coined the data scientist title in 2008.2 The role can also be seen as an extension of the Wall Street quantitative analysts—or “quants”—of the 1990s: highly intelligent, curious, mathematical individuals applying methods from eclectic fields to new problems. Data scientists arose out of necessity in early data-driven companies such as Google, Yahoo, and Amazon. Back then there were no analytics tools and no big data platforms, but there certainly were large and rapidly growing mountains of data that could—thanks to Moore’s Law—now be fortuitously analyzed. Early data scientists fashioned their own tools, developed their own algorithms, and even conducted academic-style research. As is typical of nascent technologies that are at the peak of inflated expectations, the role has been described using many superlatives that make it sound spectacular. For example, John W. Foreman, chief data scientist at MailChimp.com said data science “can call presidential races” and “reveal more about your shopping habits than you’d dare tell your mother.”3 Although no one doubts the enormous value of data science and the need to train more data scientists, the role should be critically examined to enable the democratization of big data analytics by making its practices more cost-effective and sustainable.
Challenges data scientists bring to organizations
In a recent interview in ZDNet magazine, Andrew Nusca queried a prominent analytics vendor’s chief executive officer (CEO): “Data scientists! They’re in demand. They’re rare. They’re expensive. Business leaders think they need them, even if they’re not sure what they do. What gives?”4 The fact that data scientists are very hard to find and expensive are not the only problems. The unbalanced, almost exclusive focus on the role has diverted attention from some key aspects required to establish successful and sustainable big data analytics capabilities. Some organizations have confused the skills with the individual. While the combination of mathematical, statistical, and coding skills are vital for big data analytics, these skills can be acquired and developed across a team5 and not just within a single individual. There is no doubt that the wait to find the ideal candidate has caused significant, possibly unnecessary, delays in starting the big data analytics journey in some organizations. In some cases the wait and disappointment may cause organizations to postpone the pursuit entirely. Further, some organizations that thought themselves lucky to hire a data scientist discovered they needed much more than one individual to realize and scale the benefits of big data analytics. The following issues arise from the unbalanced, exclusive focus on the role of the data scientist:
- Diverting attention from other success ingredients: The erroneous belief that hiring a data scientist is all that is needed to harness the value of data can divert attention from the need for an evidence-based culture and data-driven mind-set.6 It also draws attention from an integrated collaborative team, a modern big data analytics platform, and a sound architecture.
- Delays and lost opportunities: Waiting for the well-suited candidate can cause an unnecessary delay or postponement of the big data analytics journey and thus the ability to reap the benefits. The reality is that there is only a very small number of individuals who can match the profile of the prominent data scientists that led the big data analytics revolution. In addition, there is the lack of reliable ways to assess the quality of candidates because of the dearth of any certification programs.
- Vulnerability to the loss of a single individual: Even when organizations eventually hire a person with the right skills and abilities, the probability of losing this individual is high because of the deficit between the supply of candidates and demand for the role. Without the appropriate organizational structure and knowledge sharing, losing this single individual can jeopardize the whole analytics effort.
- Creating a big data analytics program in isolation: Although some analysts advise introducing big data analytics into the organization separate from other data warehouse systems, it must still intersect and eventually integrate with other business-as-usual systems and processes. Big data analytics has to operate on all available data from all sources including operational and external data sources.
- Ignoring the possibility of training existing staff: Hiring a fully developed data scientist versus developing the role in house from individuals with promising skills and attitudes is sometimes tempting.
- Contributing to the continuing mystification of big data analytics: As Arthur C. Clarke said, “Any sufficiently advanced technology is indistinguishable from magic,”7 which is what big data analytics appears to be to many people. However, the time has come to go beyond mystification and lab experiments toward predictable, engineered outcomes.
- Not taking advantage of recent technology developments: In the early days when there were no big data analytics platforms or tools, organizations needed the multitalented, self-sufficient superstar data scientist who created tools, developed algorithms, visualized data, and told a compelling story. A superstar data scientist is still highly desirable—if you can get one. But with the limited supply of data scientists, let alone superstars, the magic needs to be made available to a wide audience. Fortunately, there is now a class of big data analytics tools that enable end users to conduct advanced analytics and visualization—for example, IBM project NEO.8
- Not preparing for scaling big data analytics to an enterprise-wide program: Even when an organization succeeds in conducting experiments or a pilot, there is a need to scale up the big data analytics program to other areas or departments. Scaling the program definitely requires more than a data scientist; it needs an integrated team, governance, and a sound architecture.
Part 2 and part 3 of this series address the elements that can make big data analytics initiatives successful and delve into the details of core, extended, and external analytics ecosystems. In the meantime, please share any thoughts or questions in the comments.