Developers once wrote application code and just “threw it over the wall” to IT operations, which then had to ensure that those applications performed well in the production environment. This was always a less-than-optimal approach, but it became untenable as the business began to depend more and more on lots of fresh code getting rolled out into production quickly and with a high degree confidence. So IT organizations are now embracing a set of best practices known as DevOps that improve coordination between development and operations.
A similar challenge now exists with big data. Instead of developers writing application code, however, we now have data scientists designing analytic models for extracting actionable insight from large volumes of diverse, rapidly changing data sets. The problem is that no matter how awesome those analytic models may be, they don’t do anyone any good if they can’t be quickly executed in the production environment.
DataOps, the set of best practices that improve coordination between data science and operations, has therefore become a critical discipline for any IT organization that wants to survive and thrive in a world where real-time business intelligence is a competitive necessity.
- Reason 1: Speed counts. Business opportunities often have short shelf-lives. In many cases, they may be as fleeting as a website visit or a phone call. So, for all practical purposes, slow results can be no results.
Also, business intelligence is increasingly being delivered in the form of mobile apps that salespeople, marketers and executive decision-makers are consuming in real time as they head into meetings or hop onto planes. Their need and expectation is that if Google can give them answers in a fraction of a second, their BI apps should be able to do so as well.
- Reason 2: The Cloud Is Not a Panacea. It’s awesome that lots of relatively inexpensive processing capacity is available on-demand from cloud service providers. But not every big data performance issue can be solved by spinning up a bunch of Hadoop VMs.
In fact, big data performance bottlenecks are often caused by front-end data intake and transformation issues. All the cloud-based analytic processing capacity in the universe won’t help you with these bottlenecks.
- Reason 3: Big Data Workloads are Diverse. Big data is not just one single thing. One day, it’s predictive analytics. The next day, it’s mobile data serving. The day after that, it’s transaction processing.
If your infrastructure can’t handle all these different types of workloads reliably, and in real time, you won’t be able to able to give your business (and your customers) what they need, when they need it. Only by embracing DataOps can you keep your infrastructure tightly aligned with your data ingest-and-output needs.
Yes, data science is an exceptionally important discipline today. But that data science is only useful insofar as it can be efficiently and reliably executed. And for that to happen, you need DataOps.