CIO Insights: Putting data to work to drive business transformation
Over the past few weeks, I've had the pleasure of discussing data & analytics, migration to cloud, digital transformation and a lot with our top IBM execs. We talked at length over what is important to CIOs in today’s business all this while. Recently, I spoke with Rob Thomas, VP of development for analytics, the subject of business transformation led us to a discussion of a data maturity curve. Here I’ll share Rob’s take on that and more.
Rob Thomas, VP of development for analytics, IBM. He brings extensive experience in management, business development and consulting in the high technology and financial services industries. Prior to joining IBM, Mr. Thomas performed as an equity research associate at Merrill Lynch and Wheat First Securities.
Nancy Hensley, director of offering management, IBM Analytics.
Rob is one of the many speakers at World of Watson. Be there to listen to him live!
Nancy Hensley: Hi, this is Nancy Hensley, Director of Offering Management for IBM Analytics and I'm here with Rob Thomas, Vice President of Development for IBM Analytics. Good morning, Rob. How are you doing today?
Rob Thomas: Great, Nancy. Thanks for having me.
Nancy Hensley: Thank you. So I wanted to ask you a few questions for our CIOs who are listening to this podcast. I noticed a few times in your talks that you show a maturity curve for analytics. Could you tell us a little about what CIOs should be prioritizing in their agendas in order for them to get to that far side of the maturity curve?
Rob Thomas: Sure. Let's level set the minute on the maturity curve. The whole idea of the maturity curve is that companies are at a very different stage in their maturity around the topic of big data and analytics and how they're using it to transform their business. So on one side -- the relatively immature side -- of the curve would be people that are really just using new tools, techniques around data to reduce cost. And there's a lot of value in that. And companies can use that to then, you know, create savings that can be invested somewhere else.
But in my mind that's really not the endgame or the endpoint for the promise that exists around big data and analytics. So if you move along that curve and you become less focused on simply saving money, that's when you move into things like, "So how are you going to actually transform your business? What are you going to do in terms of line of business imperatives, line of business applications? How do you get to a new level of insight that impacts your daily business processes and ultimately the relationship that you have with clients?"
And so our focus and the methodology that we've built is around how we help clients move from left to right on that curve to get much further along in terms of big data maturity. And what we've found is it's not just about the tools that you use or the platforms that you develop on. It's also very much a cultural thing. And so helping clients move along that curve - that's why I use the phrase methodology. There are repeatable patters for people that can make that leap in terms of the cultural transformation that's required. And that's really where we are focused.
Nancy Hensley: Excellent. I agree. Those are some great points. So let's shift thoughts here a little bit and let's talk about open source. Because IBM's made some pretty big statements here, which is, you know -- for example -- Spark. We had a very large launch last year, we continue to make great progress. What advice would you give CIOs when it comes to really embracing open source in their architectures?
Rob Thomas: We love open source. That's been a main focus of ours in the last few years is to move a big component of everything that we do in terms of how we develop products to working in open source communities, contributing code and ultimately building our product on top of open source spaces. So that is a strategic imperative for us.
And the reason we do that is -- one -- is you get the benefits of a broader community. Two is you bring a much more open architecture to clients, which we think gives them a lot more flexibility and avoids the notion of any type of lock in. And three is it helps us move very fast. That’s one of the advantages of a broad community is that you can move really fast.
Now, as I've worked with clients what I see is many clients end up in a tough spot if they go too aggressively down that path. It requires a totally different skillset in the organization. You know, open source by definition is often immature. I mean -- just to give you an example -- the Hadoop project alone has 20,000 open GRIAs or, you know, requests, 10,000 defects. So it requires either an, you know, an intense amount of skills development within an enterprise to deal with that complexity or it requires a partner like IBM that can help you through that journey.
So I would say its two things. One is I think every CIO should look at this as a tremendous opportunity to remake their infrastructure and to do it in a way that will drive a lot of speed. But there also needs to be a healthy skepticism or caution about how fast you can move along that journey. And how you can choose the best partners to help you get there. Because it is not an easy world. Open source is the Wild West. And the Wild West has always presented opportunities and potential risks.
Nancy Hensley: I totally agree. And I think that the opportunity of being able to move at the speed of the community and not be limited by the skills within your own organization is pretty large. And agree on your cautions. In fact, one of the biggest concerns CIOs have is the management integration of open source. You know, do you think this stuff is enterprise ready? What should CIOs be concerned about in particular with managing and integrating these capabilities.
Rob Thomas: I think the key thing is - so one is I do think it's ready for primetime. That being said -- so I'll just take Hadoop again for a moment because I mentioned it before -- there are almost 30 different projects within Hadoop. And so you can't really say - anybody that says Hadoop is mature; that's a bit of a naïve statement, because there's, you know, almost 30 sub-projects within Hadoop. And some of them are incredibly mature and some are much less mature.
So point one is really understand what you're getting into. This does require a different level of study and preparation and skill for any organization. But what we've found with things like Spark is it can really transform the way an organization does analytics. And many components of Spark are very mature. And there are other pieces -- such as, you know, Spark SQL -- where we're doing a lot of contributions to help mature that.
But all of this stuff - it's an evolution. It evolves over time. And the key thing is just to have really an independent - I'd say an independent advisor as any CIO goes on this path to make sure that they end up where they want to on this journey.
Nancy Hensley: I agree. So what do you think is the hottest thing in the open source world right now? Is it Spark or machine learning or all of that stuff?
Rob Thomas: Spark - we - you know, we called Spark the analytics operating system because it is - it's the thing that we've always looked for in open source that was less about storing or moving data and more about how you get insight out of data. So yes, I think that's incredibly hot. The main use case for Spark that I see is around machine learning. That's why we contributed a machine learning engine and optimizer to Apache, which will eventually make its way into the Spark project, we believe.
Because machine learning is really about how you begin to automate the process of analytics. And as new data flows in, your model becomes smarter and then it can be reused. It really changes the heavily manual nature of analytics in building models to be something that's much more automated. So yes, I believe machine learning will become mainstream in the next three years. And the companies that lead this era will be the ones that are aggressively adopting that and using it to drive their decision making.
I'll give you an example of one retailer that I had done some research on. Stitch Fix is in retail women's fashion. And because of the way they collect data and they run machine learning algorithms on that corpus of data, they get highly personalized recommendations for every one of their clients. And the impact of that is they sell through 90 percent of their inventory in any given quarter. That is unheard of in the retail world.
Nancy Hensley: Wow, that's amazing.
Rob Thomas: Most retailers sell 30 to 40 percent of their inventory. They're selling through 90 percent. Why is that? That's because of machine learning on a large corpus of data that can enable them to make better recommendations and make better decisions than buying through the supply chain. That's the kind of impact that can be transformative with things like Spark, machine learning, open source, and analytics.
Nancy Hensley: And going back to your comment on that maturity curve, it sounds like some of these are the recipe for getting to that far side of the maturity curve, being able to have that capability, being able to get more predictive and -- as Stitch Fix did -- actually transform your business model to be much more optimized in nature.
Rob Thomas: That's right. Exactly.
Nancy Hensley: Absolutely. Let's talk about Cloud. So IBM's moved a lot of our data -- or all of our data services -- to the Cloud. That's a pretty big investment and commitment. What would you say to CIOs were thinking about their investments and moving to Cloud and especially those who feel that they can't move that easily?
Rob Thomas: The discussion has changed a lot in the last six months on this topic. I think at this point most CIOs have accepted the fact that Cloud is the destination that they need to get to. The question is how and at what pace. And so I think I'd say some of the initial concerns I've seen get overcome in the last six months.
Now, one of the biggest mistakes I see clients make when they think about Cloud is they think about taking the current paradigm -- what they have on premise -- and basically lifting and shifting that to the Cloud. And that to me doesn't - it doesn't accomplish anything. Because you don't get the main features of the Cloud in terms of agility and speed if you just take your existing, you know, fairly restrictive on premise deployments and move those in the same form to the Cloud.
Nancy Hensley: Right. Plus, there's a lot of work to do that.
Rob Thomas: Yes, exactly. So our focus is, look, as you move to the Cloud it's no longer about a traditional IT stack. It's about creating a fluid data layer of composable data services. It becomes a much more dynamic environment where, you know, you can really get to the point where you can democratize access to data in the organization.
I think that's what every organization wanted, that's probably what the number one thing that every CIO feels from the organization in terms of pressure. But the Cloud is an enabler of that, assuming that you have a composable set of data services like I described. And that's really been our focus in terms of what we're building on the Cloud.
(Nancy Hensley): So you mentioned something called a fluid data layer. Can you elaborate a little bit more on that?
Rob Thomas: So think about it this way, the traditional model has been load data into one repository, build BI applications or custom applications on top of that. And it's very rigid, because you have to move data into that single repository in order to get access to it. When I talk about a fluid data layer, this is about ingesting data from every potential source -- whether it's on premise or in the Cloud -- bringing that into the organization. Think of it as a stream of data running through the organization. Cataloguing that data, doing your governance up front, and then building and deploying analytics against that stream of data as needed.
So it's no longer about I'm going to choose one repository, it's about I've got this stream of data and I can persist it into any form factor or any repository as analytics are needed by the organization. That's why Spark is so important, because Spark gives you the ability to do that type of persistence or that processing above the repository layer, which again totally changes the dynamic from a rigid construct to much more of a fluid construct. So that's what we mean. It's about really changing the nature of how data flows into an organization and then how it can be accessed.
Nancy Hensley: Absolutely. And I always say that it goes - you can go from having a system of insight that we've built over the years such as a data warehouse to all of your data being systems of insight for the organization where you really democratizing and leveraging all the data that you collect, even some of the dark data that you don't understand what's going on inside of that data. It's all about providing that access in a very fluid architecture.
Rob Thomas: Agreed. Absolutely.
Nancy Hensley: Well thank you very much, these are some great pointers for our CIO, Rob, and we thank you for being here. Have a great day everyone and thank you for tuning in.
Rob Thomas: All right, thank you Nancy.