Healthy Data: Getting with the Program

Distinguished Engineer, IBM

In my last post, we reviewed what we mean by "healthy data" for Smarter Analytics. In this post we'll talk a bit about an overall program for achieving healthy data.  

Getting to healthy data is a bit like getting to a healthy body - a problem many of us are all too familiar with. It is an ongoing process of watching what we ingest and making sure we use the energy effectively. While this analogy is by no means perfect, it provides a useful framework. 

The first step is, of course, to establish some goals. This is perhaps the most important part of any improvement program. Getting to healthy data is no different. We can start by exploring a few key questions:

  • What are you trying to achieve?
  • How healthy are you now?
  • What are some reasonable steps?

Goals and targets are hard. We all know that if goals aren't realistic they are hard to achieve. If what you are trying to achieve is running a marathon and you can barely run a mile, then setting a goal of running a marathon in a week is not terribly realistic. You need an honest assessment of how healthy you are today. From there, seek expert advice and carefully plan a program of diet and exercise to achieve your goal over time.  

The same thing is true for achieving healthy data. If, for example, we are trying to establish healthy data in support of a customer analytics project - and like many organizations, we have several sources of customer information - we may start by selecting a single source system and then extending to others.  

Profiling the data in a system helps us understand the kinds of issues that may be lurking and to establish a baseline for improvement. This baseline helps us determine the gap between today's realities and where we ultimately want to be. We'll revisit this topic in more detail in subsequent entries - but I think it is pretty clear that establishing a good set of incremental goals is one of the keys to success.

Once the goals are established, we use these goals to put together a plan for achieving them - Diet and Exercise. Diet is about selective consumption of information and exercise is about preparing, transforming and managing the information needed for our analytics. Again, future entries will delve into this further.

The final step is to track our progress. As we all know, actually measuring, monitoring and reporting on key metrics often provides significant incentives for improvement of any kind. It encourages a level of ongoing awareness that, allows us to not only notice what is happening but highlights the opportunities to change. Getting to healthy data is no different. We need to carefully choose what we want to measure - perhaps it's completeness of customer records or maybe it's valid addresses - so that we can focus our efforts and measure our results. As our results start to improve, we go through the cycle again and again, addressing and monitoring new issues until we achieve our goals. This creates a virtuous cycle of setting goals, making the changes to achieve those goals and measuring the results.

That is the process for achieving healthy data - sounds simple? Well, the devil is always in the details - and we'll spend the next posts starting to discuss some of what seems to work - as well as what doesn't. Getting to a healthy body takes realism, hard work, and perseverance. Getting to healthy data is no different.