Is 2012 the year of Agile Analytics? Recent publications show growing interest in the application of Agile methods to analytics:
- Ken Collier, an Agile pioneer, tackles analytics in his aptly named new book Agile Analytics.
- A quick Google search surfaces a number of recent blogs and articles.
- Curt Monash recently published an excellent two-part blog on the subject.
Here on thinking.netezza.com, we’ve commented in the past on techniques that contribute to Agile Analytics, such as in-database analytics, open source analytics and tighter integration with commercial packages like SAS. In addition, we’ve commented on some of the barriers to agility, such as limitations of the PMML standard.
In this blog, we’ll review how Agile Analytics differs from conventional approaches, review key drivers of current interest and identify some of the business practices that enable it.
What is Agile Analytics?
Agile Analytics is an approach to predictive analytics that emphasizes:
- Client satisfaction through rapid delivery of usable predictions
- Focus on model performance when deployed “in market”
- Iterative and evolutionary approach to model development
- Rapid cycle time through radical reduction in time to deployment
The Agile approach focuses on the client’s end goal: using data-driven predictions to make better decisions that impact the business. In contrast, conventional approaches to predictive modeling (such as the well-known SEMMA  model) tend to focus on the model development process, with minimal attention given to either the client’s business problem or how the model will be deployed.
Since Agile Analytics is most concerned with how well the predictive model supports the client’s decision-making process, the analyst evaluates the model based on how well it serves this purpose when deployed under market conditions. In practice, this means that the analyst evaluates model accuracy in production together with score latency, deployment cost and interpretability – a critical factor when building predictive analytics into a human process. Conventional approaches typically evaluate predictive models solely on model accuracy when back-tested on a sample, a measure that often overstates the accuracy that the model will achieve when deployed under market conditions.
Agile analysts stress rapid deployment and iterative learning; they assume that the knowledge produced from tracking an initial model after it is deployed enables enhancements in subsequent iterations, and they build this expectation into the modeling process. An Agile analyst quickly develops a predictive model using fast, robust methods and available data, deploys the model, monitors the model in production and improves it as soon as possible. A conventional analyst tends to take extra time perfecting an initial model prior to deployment, and may pay no attention to in-market performance unless the client complains about anomalies.
Reducing cycle time is critical for the Agile analyst, since every iteration produces new knowledge. The Agile analyst aggressively looks for ways to reduce the time needed to develop and deploy models, and factors cycle time into the choice of analytic methods. Conventional analysts are often strikingly unengaged with what happens outside of the model development task; larger analytic teams often delegate tasks like data marshalling, cleansing and scoring to junior members, who perform the “grunt” work with programming tools.
What’s Driving Interest in Agile Analytics?
A combination of market forces and technical innovation drive interest in Agile methods for analytics:
- Clients require more timely and actionable analytics
- Data warehouses have reduced latency in the data used by predictive models
- Innovation directly impacts the analytic workflow itself
Business requirements for analytics are changing rapidly, and clients demand predictive analytics that can support decisions today. For example, consider direct marketing: ten years ago, firms relied mostly on direct mail and outbound telemarketing; marketing campaigns were served by batch-oriented systems, and analytic cycle times were measured in months or even years. Today, firms have shifted that marketing spend to email, web media and social media, where cycle times are measured in days, hours or even minutes. The analytics required to support these channels are entirely different, and must operate at a digital cadence.
Organizations have also substantially reduced the latency built into data warehouses. Ten years ago, analysts frequently worked with monthly snapshot data, delivered a week or more into the following month. While this is still the case for some organizations, data warehouses with daily, inter-day and real-time updates are increasingly common. A predictive model score is as timely as the data it consumes; as firms drive latency from data warehousing processes, analytical processes are exposed as cumbersome and slow.
Numerous innovations in analytics create the potential to reduce cycle time:
- In-database analytics eliminate the most time-consuming tasks, data marshalling and model scoring
- Tighter database integration by vendors such as SAS and SPSS enable users to achieve hundred-fold runtime improvements for front-end processing
- Enhancements to the PMML standard make it possible for firms to integrate a wide variety of end-user analytic tools with high performance data warehouses
All of these factors taken together add up to radical reductions in time to deployment for predictive models. Organizations used to take a year or more to build and deploy models; a major credit card issuer I worked with in the 1990s needed two years to upgrade its behavior scorecards. Today, IBM Netezza customers who practice Agile methods can reduce this cycle to a day or less.
What Business Practices Enable Agile Analytics?
We’ve mentioned some of the technical innovations that support an Agile approach to analytics; there are also business practices to consider. Some practices in Agile software development apply equally well to analytics as any other project, including the need for a sustainable development pace; close collaboration; face-to-face conversation; motivated and trustworthy contributors, and continuous attention to technical excellence. Additional practices pertinent to analytics include:
- Commitment to open standards architecture
- Rigorous selection of the right tool for the task
- Close collaboration between analysts and IT
- Focus on solving the client’s business problem
More often than not, customers with serious cycle time issues are locked into closed single-vendor architecture. Lacking an open architecture to interface with data at the front end and back end of the analytics workflow, these organizations are forced into treating the analytics tool as a data management tool and decision engine; this is comparable to using a toothbrush to paint your house. Server-based analytic software packages are very good at analytics, but perform poorly as databases and decision engines.
Agile analysts take a flexible, “best-in-class” approach to solving the problem at hand. No single vendor offers “best-in-class” tools for every analytic method and algorithm. Some vendors, like KXEN, offer unique algorithms that are unavailable from other vendors; others, like Salford Systems, have specialized experience and intellectual property that enables them to offer a richer feature set for certain data mining methods. In an Agile analytics environment, analysts freely choose among commercial, open source and homegrown software, using a mashup of tools as needed.
While it may seem like a platitude to call for collaboration between an organization’s analytics community and the IT organization, we frequently see customers who have developed complex processes for analytics that either duplicate existing IT processes, or perform tasks that can be done more efficiently by IT. Analysts should spend their time doing analysis, not data movement, management, enhancement, cleansing or scoring; but surveyed analysts typically report that they spend much of their time performing these tasks. In some cases, this is because IT has failed to provide the needed support; in other cases, the analytics team insists on controlling the process. Regardless of the root cause, IT and analytics leadership alike need to recognize the need for collaboration, and an appropriate division of labor.
Focusing the analytics effort on the client’s business problem is essential for the practice of Agile analytics. Organizations frequently get stuck on issues that are difficult to resolve because the parties are focused on different goals; in the analytics world, this takes the form of debates over tools, methods and procedures. Analysts should bear in mind that clients are not interested in winning prizes for the “best” model, and they don’t care about the analyst’s advanced degrees. Business requires speed, agility and clarity, and analysts who can’t deliver on these expectations will not survive.
Organizations seeking to compete with analytics need to bear in mind that while investments in technology are necessary, they are not sufficient for high-performance analytics; business practices must change as well. The Agile Analytics approach offers an excellent framework for accelerating change. In the coming months, we will continue to blog about enablers and barriers to Agile Analytics, and demonstrate how IBM Netezza customers benefit from these methods and practices.
 Sample, Explore, Modify, Model, Assess