Selecting a First Big Data Project

Where to start and what to look for in big data projects

Solution CTO, IBM

Recommendations for selecting, staffing, and planning a first big data project are based on many years of experience working with a wide variety of customers in several industries. We won’t focus on specific technologies in this series. Instead, we will examine the organizational dynamics and lessons learned from how these projects go in real life, inside existing, often very busy IT infrastructures. Every customer is different, so please take these as general guidelines rather than hard and fast rules. Please note that this is not focused on normal project management. We assume that you have adequate project management discipline in place, and we’re simply going to look at the dynamics that can be applied to your big data project.

Know what your compelling drivers are

The first, most obvious question is “Why do this at all?” There should be a compelling use case, a competitive driver, cost driver, or some other issue that has been identified where the application of big data technologies is in the critical path to solving the problem. Typical drivers include the information type (for example, under-utilized structured information sources), or the volume of information (retention of IP logs), but in any case, you need to identify exactly why you are pursuing this path. One of the most important things you should look for is a compelling ROI (return on investment). That is to say that you can put a value on the cost of the problem before you plan a solution. When calculating the problem’s cost, remember to add the cost of the effort you will put into solving it, both from a technology and labor point of view. You can then compare that to the “after” state to determine the overall value of the project. Now, it is important to understand that the first phase of a project may not, in and of itself, be ROI positive. But what is important is for you to have a line of sight through the completion of the entire use case of the project. It may be the second or third application of the technology that flips the switch to a positive ROI. A good example of such a line of sight comes from one of our clients, who uses the natural language analytics capabilities of InfoSphere BigInsights to understand email correspondence. Through the BigInsights analysis, our client can identify problems in customer satisfaction before they manifest in an unhappy customer leaving the firm. To begin this project, our client selected a subset of its data to analyze; in this case, the data represented just one region of the country. Once we analyzed the data from that region, it did, in fact, show a positive ROI. However, we didn't base the model on achieving positive ROI from a single region. The project’s ROI plan was based on running all of the email across the client’s entire U.S. footprint through the system, not just one division. Staking out a path to ROI and getting agreement on it keeps everyone focused on what is practical. Please note we’re not saying that pure experimentation – and going after R&D projects first as a way of understanding new technologies – is not a valid approach. What we are saying is it is important to differentiate between experimentation and your first “business” project. Do not confuse how you learn with how you implement something that ultimately has to go into production. In future columns, we will cover:

  • Selecting people before technology
  • Planning for success
  • What the production version looks like
This article was originally published on the Smarter Computing Blog.

[followbutton username='thomasdeutsch' count='false' lang='en' theme='light']

 <table cellpadding="0" cellspacing="0" valign="top" width="15%>

  [followbutton username='IBMdatamag' count='false' lang='en' theme='light']