Big Data’s Legacy is Agility

VP Strategy, Silicon Valley Data Science, Editor-in-Chief, Big Data Journal

While much of today’s focus on big data is on “smart data,” the power of analytics, the most lasting benefit big data will bring to business worldwide, is agility.

The tools of big data found their genesis in the data-driven startups of Silicon Valley. In Google, Yahoo!, Facebook and Twitter: companies whose very products are formed out of data, and who routinely experiment with, create and destroy new features based on data.

Edd-agility-quote.pngWorking at “web scale” means these companies had to break out of products that couldn’t scale, or would be too expensive to scale, and so they relied on open source, commodity hardware, and a lot of invention. What they created may not have had the enterprise bells and whistles, but it afforded them the flexibility to underpin massive growth and success. And so began the era of big data, of which we are still in the early days.

When software is the agent of business growth, the agility found by the web giants is a must. The enablers of agility in big data are twofold: in the organizational nature of connecting business right to the data, and in the tools themselves.

Organizational agility

The essence of a successful data-driven business is experimentation. “Navigating the competitive landscape” is an apt metaphor. That landscape is ever changing, and any preconceived route is likely to fail. While every organization will benefit from understanding its own operation better—the key things that “business intelligence” brings—that alone doesn’t make you a data-driven business, any more than the dashboard alone enables you to drive a car. Big data allows you to see further, and ask more questions, than you ever could before. Reaping the benefit from the answers is strongly connected to how easily you can translate that insight into action.

Turning the insight into action starts with experimentation: to use data to get a better outcome, you have to try it and see if your model of the world is right. Experimentation becomes possible when the cost of change becomes cheap: when you can either simulate your business, or affect it in the real world by automated means. Those things enable operational agility.

Agility also demands a different mindset in management: the acceptance of Rumsfeldian unknowns—the things you don’t know you don’t know—and the understanding that waste is an essential accompaniment to innovation. This ripples down right to the way you put teams together: whether your data scientists are isolated in a hallowed hall, in which case they can likely only tell you about things you know already, or working closely with the people who own the business problems, where experimentation becomes feasible. A data-driven product team must be cross-functional, and able to react rapidly when new information comes to light. The whole business must accept that it’s on a journey through changing and uncharted waters, not taking the bus.

Technical agility

It’s no good having an agile mindset if the tools you use are optimized for a different worldview. Fortunately, most of the tools that define big data were created in an atmosphere of iteration and experimentation. There are two big contributors towards this agility: the NoSQL movement, and Hadoop itself.

NoSQL in its essence really means “no predefined schemas.” Database schemas inevitably become very tied to the way the applications using them work, and changing them can be expensive, especially when dependent applications are already up and running. NoSQL databases, which allow the rapid evolution of the shape of the data inside them, in turn permit a rapidly evolving application ecosystem. Their use does not preclude a later formality on data structure, perhaps for stability, validation and quality reasons, but they rather permit early development to progress rapidly and with a low cost of change.

The bigger contributor to technical agility is the different mentality Hadoop permits. The old way of data warehousing involves a priori data cleaning and validation: selecting what was important and structuring it before storing it. Inherent in that approach, made necessary by restricted resources, was the hope that “unknown unknowns” would not rear their head. When they did, several months’ turnaround is the expected outcome before changes could be made, e.g. a new field added to a report.

Hadoop allows those decisions about structuring and cleaning data to be made “just in time,” and it makes it easy to rethink them should new requirements be encountered. Massively cheaper storage and processing power means we can now store everything, and open-source software means we can afford to process on many CPUs. A new enterprise architecture that embraces this will mean that the assumptions that are inevitably made in creating systems today won’t be prohibitively expensive to reverse tomorrow.

Finally, should you think I’m an anarchist nutcase, a word on governance. The “wild west” of big data may allow many new pioneers to flourish, but without structure and rule of law, they’ll be limited in their growth as they become unmanageable. Data security, data provenance, and dependency management all matter, and the big data tool ecosystem is starting to grow and embrace these aspects of data. The key is that we reinvent them without killing the essence of agility.


The downside of the big data hype is that many legacy vendors have rebranded themselves as big data, bringing perhaps the “big,” but certainly not the other aspects that enable a data-driven attitude to business. (This isn’t always through want of their trying: I asked one large BI vendor what their biggest problem was, and they said that it was that customers tended to silo their analytics departments, disconnecting them from the rest of the business, and thus consigning them to dashboard and report creation.)

Business should acknowledge that it’s not just new tools such as Hadoop that have created data-driven success, but an organizational attitude. The role of IT has split in two: one half to resource infrastructure, and the other that is enmeshed with product and business development. These two halves require different management mentalities. They may have computing in common, but their needs are quite different.

Where big data spreads successfully, so will the working styles and management attitudes to data that made the web giants so successful.

More from Edd

Listen to Edd talk more about Big Data as Rocket Fuel in this podcast.