Using Hadoop in the cloud to fail fast

Portfolio Marketing Manager, IBM Snow is a data and application architect who loves helping customers with their data architectures. With more than 20 years of working with business stakeholders at all levels, Snow’s IT experience spans the banking, insurance, manufacturing, retail and telecommunications industries as well as the government sector. He is currently focused on IBM Cloud Data Services and emerging technologies such as big data streaming architectures. In his spare time, Snow is the leader of and contributor on an open source project providing executable examples for IBM BigInsights for Apache Hadoop. The examples kick-start BigInsights projects, allowing you to move at warp speed on your big data use cases. The project can be found on the GitHub website.

Andrea: Thanks for joining us today. Let’s start with a quick introduction and what you’re working on currently.

Chris: I’ve been working in IT now for about 20 years, with a number of different companies large and small, and on a number of different projects as well. Some are very agile; some are very waterfall. Basically, I’ve got tons of experience implementing software projects. My current focus is on big data solutions, and my biggest drive is helping customers to understand what data they have, what data they could have and what they can do with that data.

Recently, you and I were having a chat, and you mentioned that you were working on a mobile application. I was wondering if you could tell me more about it: the project history, the customer you’re working with and what problem or opportunity the customer was after. Also, what were your thought processes as you developed the app?

Sure. The project was incepted as a result of work I’ve been doing with several customers over time. I see a lot of customers working at developing a better understanding of how to get the most out of their services, and also how to integrate their services. The key thing is that having any useful service is good, but having two or more services talking to each other has the potential to double—or more—the business benefit, the user benefit or both.

The key thing is that having any useful service is good, but having two or more services talking to each other has the potential to double (or more) the business or user benefit – or both!

In one use case, I’m using the IBM Bluemix platform services such as IBM BigInsights, IBM Apache Hadoop as a service and Cloudant, one of the IBM database-as-a-service technologies. Cloudant is a great technology for ingesting and storing data for systems of engagement. Systems of engagement are those systems that engage with end users. They need to be able to scale massively to handle the workloads of the most demanding web and mobile workloads.

But then how do we work with that data on the back end? Acquiring lots of engagement data is fantastic, but we generally need to be able to do something with that data. We’ve got technologies such as dashDB, which is really good for some use cases such as analyzing the data from Cloudant. If we want to pull reports from the data or do some basic machine learning, we can work with just these technologies.Generally speaking, however, you need more power and more tools to be able to really get dirty with the data, such as combining it with unstructured social media data. Which brings us to the mobile application I’m working on. The app allows users to share their data. BigInsights does back-end analytics on the data and pushes the insights back into the app for a rich user experience. Cloudant takes care of the sharing of the data for the app, and BigInsights does analytics on the back end.

Integrating data for engaging applications 

And doing all this on the same platform saves time, which is smart. Well done. I’m hearing more frequently about businesses learning to do more with previously unused data, and I’m hearing that Hadoop and Apache Spark work well in combination to support advanced and richer analytics.

Yes, loads of use cases such as this one exist in which there are opportunities to interact with customers and make an application more engaging—including doing something serious with all that back-end data. Personal finance is a great example.

Let’s say you’ve got a married couple that uses an app to track finances by entering information on what they spend in the shared app experience. After setting budgets, the couple uses the app on a daily basis to enter transactions. The app provides the visibility into how well they are keeping to their budget.

Then on a monthly basis, we use BigInsights to analyze the spending with external data such as weather data. This analysis can open up a whole new set of insights such as how the weather may be driving their spending and provide some suggestions for less-impulsive buying. These opportunities and this thinking are behind the application I’m working on.

I like what I’m hearing. I can think of a few ways that the apps I use could be improved with back-end data integration or even apps that could talk to each other. What kind of challenges are you encountering in building the app?

The first challenge I’m tackling is making sure that I can fail fast. With any kind of sophisticated information system, when you try to couple two or more systems together, one of the things you have to focus on early in the project is the integration piece. Normally, the risk is there, I guess, and it is the area that you want to make sure you’ve got right first. What you don’t want to do is spend weeks, for example, working on on-premises software installing, configuring and getting that software ready—self-administration of Hadoop or Spark—and then trying to configure it with Cloudant. In addition, you don’t want to then encounter challenges in moving the data or getting the integration right. In the end, you may find that your plan doesn’t actually work. This example of not failing fast is common. You might spend up to a full month in a development effort and an administration effort for nothing.

With BigInsights on Cloud, you can skip all that up-front effort. You can instead get on Bluemix, spin up environments and within ten minutes or less start working. This approach means that I’ve leapfrogged a bunch of risk and am getting right to the important stuff: how am I going to integrate BigInsights and Cloudant, and what data do I need to move between the services? How am I going to leverage machine learning capabilities with Spark on that data?

Already, I am focused on the things that really matter. I don’t care about things such as spinning up the clusters, or Hadoop administration, or Spark administration, and this result is a huge value from BigInsights on Bluemix. I can figure out in as little as five minutes if my plan is going to work. If my idea wasn’t sound, or I hadn’t thought it through properly, I’ve not lost time. In the old days, I would have lost weeks standing up my environment to test out my idea. fast for fast innovation

Failing fast could be called a form of iteration, couldn’t it? Failing faster to iterate faster to innovate faster. What are your top tips about leveraging BigInsights and other Bluemix data and analytics services to support development and innovation?

The Hadoop-as-a-service offering is terrific. I can spin up a virtualized, basic BigInsights cluster in minutes to test my ideas, and I have the reassurance that when I really need to focus on maximum performance I can move to an enterprise BigInsights cluster than runs on bare metal. BigInsights is ever so productive because it is part of a whole offering of data and analytics application services.

What I’m getting at here is that I am able to integrate these multiple products from IBM on Bluemix—BigInsights with Spark and Cloudant—and feel confident that the architecture will be solid. I know that IBM has a big focus on making sure that the integration will be strong and all the working pieces are going to work well together. The same goes with work I’ve done using Spark as a service and BigInsights; I’ve been moving data around between all these tools. And I’ve come into contact with guys from the IBM Spark Technology Center as well—real experts. They give me a lot of confidence that the stuff I am working on, the stuff I am trying to do as an end user, as an end developer, has got a lot of smart people behind it focused on my success.

That’s really terrific to hear. I think you said you’d be publishing your mobile application in a tutorial; is that right? And where can someone go if they want to learn more about our IBM BigInsights on Cloud offering?

Yes, that's right. You can find my other work samples on the GitHub website. And visit BigInsights on Cloud for a closer look at that offering.