Patting down the pachyderm: Big data prognostications for 2015

Big Data Evangelist, IBM

Elephants are astonishingly intelligent creatures. Long ago, on a family vacation to Indonesia, I had the pleasure to see a troop of trained elephants perform close-up in an audience-interactive show. As I witnessed one of the animals crouch down around my intrepid firstborn, I was relieved to see that it was smart enough to follow its trainer’s instructions, sensitive enough to the boy’s presence and agile enough to execute the entire maneuver like the professional performer he is.

As 2014 draws to a close, the proverbial elephant that we call “big data” is smarter, more sensitive and more agile than ever. It’s got a much more varied array of advanced analytics riding on its broad back. More than that, it’s performing these amazing feats as a team player within a growing troop of innovative creatures, including Hadoop, NoSQL, in-memory and relational databases. Together they accomplish far more than any one performer could possibly pull off individually.

2015-Predictions_JamesKobielus_Blog.jpgIn trying to describe big data’s agile elephants, we industry prognosticators are in much the same situation as the proverbial blind men who touched the creatures from every angle but can’t agree on a common picture of what it is. I’ve already reviewed prognostications in Information WeekForbesData Science CentralKDNuggets and in the sprawling community I like to think of as the “Hadoop-o-sphere.” I’ve even revisited my own big-data predictions from a year ago, and my own Internet of Things forecasts for good measure.

It’s clear that we’re all patting down the same pachyderm. Actually, it’s a growing pack of them, with no clear-cut order of dominance. Some observers think one of the species—Hadoop—is the only one worth mentioning; others, self included, know otherwise. And we know that the beasts themselves—the storage and processing platforms—are not the whole story of big data. You can’t understand where this arena was in the year gone by or where it’s heading unless you observe the larger picture of adoption trends, disruptive applications, development methodologies and best practices.

Here now are my impressions of the big data space in 2014 and what lies ahead in 2015 and beyond. I write all these in the presumptive “we” to signify that I’m speaking (I like to think) on behalf of all the blind big data pundits who are desperately groping the varied appendages of this brainy behemoth:

  • The elevation of the chief data officer. We saw an uptick in 2014 in enterprise interest in establishing a chief data officer (CDO). As I noted in this recent blog, it’s still not clear whether the responsibilities of a CDO, as usually defined, differ enough from those of chief information officer (CIO) to justify creating a new job title. Nevertheless, I agree that, whatever you call this role, organizations everywhere need a senior executive who oversees application of organization's precious data resources to strategic business uses. If nothing else, they’ll be in charge of monetizing any business data that has resale value. But they’ll also be responsible for managing the data scientists who build the analytics for extracting value, monetized or otherwise, from the data. I predict that, in 2015, the CDO (or equivalent position) will assume greater responsibility for managing the data-science development teams who build and maintain the most mission critical big data applications.
  • The democratization of data science. We saw growing recognition in 2014 that data scientists are the core developers of the big data revolution. As the year wore on, fewer industry observers seriously believe that data scientists are semi-mythical unicorns. And even fewer will harbor any illusions that this extremely demanding profession is as “sexy” and glamorous as it has been made out to be. As we roll into 2015, we will continue to see widespread anxiety around the shortage of skilled and professional data scientists. I predict that in 2015 organizations everywhere will tackle the data scientist shortage head-on by enrolling their analytics professionals in college-level courses, training institutes, multi-user open online courses (MOOCs) and other resources. As I noted in this blog, data science skills will soon be everywhere, and analytics professionals can withstand the inevitable commoditization of the profession by continuing to deepen their statistical analysis skills through formal education. I also predict that the best data scientists will differentiate themselves in 2015 and beyond by deepening their ability to construct sound business narratives that contextualize their statistical analyses.
  • The deepening of cognitive computing. We saw artificial intelligence fully embrace the 21st century in the year gone by, in the form of cognitive computing. This set of technologies—which essentially applies artificial intelligence to big data—is the heart of IBM Watson. Cognitive computing relies on machine learning, deep learning, artificial neural networks, natural language processing and other statistical approaches for automatically detecting correlations and other patterns in big data that might otherwise go unnoticed. As I noted in this blog, we have entered the cognitive systems era, in which processing logic is derived from data rather than needing to be programmed or hard-wired into applications. I predict that in 2015 there will be a boom of investment in deep-learning technologies to power streaming analytics in-line to media & entertainment, real-time surveillance, video over IP and other important applications.
  • The rise of data curation. We saw big data become a mass-market phenomenon in 2014, thanks to self-service guided analytics tools such as IBM Watson Analytics. Now every user, not just data scientists, can easily leverage the full power of big data, advanced analytics and cognitive computing in every aspect of their lives. What this brings to the forefront is the issue of how any of us can possibly sift through it all when trying to find what data-driven insights might be most relevant to our circumstances. As I noted in this recent blog, curation is becoming a critical governance function for extracting the value of big data assets. Curation addresses the data quality criterion of relevance, and curators (typically subject-matter experts) might be regarded as being responsible for a “single version of what’s worthy of your consideration.” I predict that in 2015 many established enterprise data-stewardship practices will broaden to incorporate curatorial functions under their broad mandate.
  • The opening of data, applications and expertiseOpenness is the core architectural principle of all data, analytics, application, platform and business initiatives in the global community. Open-source initiatives are transforming all big-data platforms, as Hadoop continues to evolve in countless directions, NoSQL startups and innovations come fast and furious and open languages such as R entrench themselves more deeply into developers’ core repertoires. In 2014, we saw open-data initiatives flourishing around the world, many of which are focused on bringing greater government transparencystimulating economic development and fueling the democratization of data-driven decision making. Open marketplaces for data science expertise have taken hold and expanded their membership. More data scientists have opened up their schedules to deliver expertise on a pro bono basis to projects for social betterment. In 2015, I predict that more organizations will turn to open-source communities to recruit their big data development teams, assemble their big data platforms, tools and libraries and populate their data repositories with social listening data, reference graphs and other “for the taking” assets. However, few users will take “all open source” big data strategies to their logical extreme, given that competitive advantage comes from building up core competencies and assets (skills, data, platforms, applications and so on) that their competitors can’t easily match. Also, the dizzying diversity of purely open-source solutions will continue to place a premium on solution providers that can that integrate the best of open source with the best of their own full-featured information management solutions. IBM Watson, which incorporates Hadoop, is a good example of those synergies.
  • The hybridization of big-data analytic infrastructures. The evolution of big data environments into heterogeneous architectures, where no one platform is optimal for all roles, continues its steady march toward ubiquity. As I observed well over a year ago, big data is evolving into a hybridized paradigm under which Hadoop, massively parallel processing (MPP) enterprise data warehouses (EDW), in-memory columnar, stream computing, NoSQL, document databases and other approaches support scalable analytics in the cloud. In 2014, we saw new platforms gain significant traction in this hybridized marketscape, especially Apache Spark, which tantalizes with its convergence of Hadoop, streaming, in-memory and graph analysis. Though some industry observers think Spark will push Hadoop out of the picture, I doubt that will happen, given Hadoop’s strong commercial momentum and also due to the fact that a hybridized cloudscape can easily leverage the strengths of both environments, which overlap considerably. Likewise, in-memory analytics architectures have come into next-generation cloud-based data warehousing platforms such as IBM dashDB, thereby dissolving the boundaries among these approaches. Hybridization is also well under way among in-motion and in-memory analytics architectures, streaming and Hadoop architectures and transactional and analytical database architectures. In 2015, I predict that this ongoing hybridization will accelerate to address the leading-edge data and analytic requirements of cloud computingmobile computing, the Internet of Thingsstreaming media and social networking.

In terms of the mass-market cultural milieu surrounding big data, 2015 will be indistinguishable from 2014. Anxiety surrounding the potential of algorithmic decision automation to put people out of work (in other words, Luddism in the age of big data) will remain. Likewise, privacy and security concerns surrounding the application of big data analytics to personally identifiable information will not abate. Organizations will need to address these concerns as they weave big data into their data management practices and, more broadly, into their business strategies.

When discussing the power of big data in modern life we must always return to the elephant analogy. Given that big data is increasingly all data, retained indefinitely, powering everything you do and every customer engagement, we must acknowledge that big data never forgets; customers have every reason to be concerned about whether you’re safeguarding their data and their privacy.

The more things change in big data in the coming year, the more this core imperative will remain the same.