Making Data Simple: The 5 areas businesses MUST get right
- Episode 1: Making Data Simple: The big data problem
- Episode 2: Making Data Simple: End of tech companies
- Episode 3: Making Data Simple: A new definition of client care
- Episode 4: Making Data Simple: Will machines take our jobs?
- Episode 5: Making Data Simple: Growth Hacking - Not just for start ups
- Episode 6: Making Data Simple: From 2D to 3D -- Augmented reality data visualization
- Episode 8: Making Data Simple: How data science is helping to improve aviation
- Episode 9: Making Data Simple: Making data fun & easy with Caleb Curry
- Episode 10: Making Data Simple: Data movement at size and scale
- Episode 11: Making Data Simple: Cloud computing, part 1
Al Martin: My name is Al Martin. I am responsible for Analytic Development and Device Tests at IBM. I now live in Overland Park Woods, KU. Any KU Alumni in here? Awesome. Rock Chalk. Any MU, K-States? Alright. I hope you have a mediocre conference. I love the Midwest, others look at it as a flyover state, I like that because that will maintain my 15 minute commute to work. There are viewed like two things they associate a poor Kansas boy with. Anybody want to guess the first one? Barbeque, that’s pretty close to the second one I’ll get through. Anyone?
Man 1: Wizard Of Oz.
Al Martin: I tell you what — I mean if I had a nickel every time that they said, you know, “You’re not in Kansas anymore,” I’d be a rich man. The second one. Cows. They think it’s me and a bunch of cows. I don’t think they think anybody else is here, which leads to barbeque. I’m a little unorthodox in the way I do keynotes. I thought we would have a little teaming exercise as well — with this cow. So I’m going to digress for just a minute, it will all make sense.
In 1968 there was a submarine, it was called The Scorpion. It was in the North Atlantic heading to Newport Beach — disappeared, nowhere to be found. So it had to be someplace in the Atlantic, on the ocean floor. They called in a Navy Officer called John Craven and they said, “Hey, you gotta find this thing.” They brought in all these scientists and mathematicians, you know, to see where they could find it. He said “Look, anybody want to do some gambling?” We have got to find another submarine. We know the last radio frequency — or radio call that went out, let’s figure it out. So they went over all the scenarios and they found that submarine — it was 220 yards away from where they said it would be. So that’s like the manual for Wisdom of Crowds, Machine Learning, which is in a book called The Wisdom of Crowds that you might want to take a look at.
So there’s another guy named Francis Galton, he was a statistician and went to a fair and at the fair, you know, they were giving prizes if you could guess the pounds of the ox and he was smart enough - he knew The Wisdom of Crowds so he got a ton of people together — he got it within one pound. So my query to you today, who wants to guess how many pounds that cow is right there? Anybody want to throw some ideas? I got 700, 800, 1,300, 1,500, 927 — this is like "The Price is Right." What do you think the average of that is? If we were going to go with the average of that, what would you guess? Every time I do this we end up, you know, we do a little team exercise — and I’m always within 100. So this is kind of The Wisdom of Crowds, I think it’s the manual form of Machine Learning.
Let me talk about the disruption in the business real quick and I’m sure you guys are sick of hearing about it. You feel it every day. I think I get sick about it, but I figured out the reason I get sick about it is because you have got to constantly be changing and I mean it’s just like human nature is anti-change, I believe. You try to get in your comfort zone - so I think it’s important because until you realize the disruption that the industry is in right now and I want to talk a little bit about it. I went to Las Vegas last Sunday — the Uber driver says to me, he says “Look, I don’t get it. The cab industry in Vegas is down 30% and yet they won’t reduce their prices and they don’t even have an App yet.” I said, “Look, you’re describing like damn near every big company there is out there that doesn’t want to make that change.” It’s hard to make that change, right? Airbnb, it rents more rooms than all the other hotel chains combined and they don’t own one room, that’s pretty impressive.
Amazon, I won’t even talk about technology and Amazon, but they buy whole foods, it took three days — the market reacted and the rest of the grocery industry loses $12.8 billion in market cap with everybody just starting selling. I mean - so here in the grocery business you look at it, you’re in slow margins anyway and you think — you know that they can’t come after us, their coming after everybody. And then Pokémon. I don’t get it. I’ve done it, Kodak if it goes around longer than 100 years, it could make the digital conversion. Blockbuster, you know the story there — Netflix, I could go on and on with these businesses, just in the environment alone.
And then you got Brexit. You can look at our political environment. You got the Falcons bullying a 28-3 lead in the 3rd quarter with two minutes left. I know that some Patriots are standing back there. How is to win the World Series after 100 years? I think the world has gone crazy - haven’t even talked about hurricanes or anything else like that, but we are in a period of disruption and if we all don’t change and continually embrace and disrupt ourselves — some of us will either not be here next year or we’ll be sitting for another company. That’s the way it’s going to be.
So there is a poll that was done by The Digital Polls that talk about how much disruption there is to date versus how much disruption in the next 12 months and look - it’s not going to stop. The three top areas - not like it matters, but financial, healthcare and industrial.
There’s a good book that I recommend, it’s called Thank You for Being Late. Now what he contends is in 2007 the world is not crazy. In 2007 the iPhone came out, Airbnb, Facebook, Twitter — his position is in 2007, that’s when technology, software — software eats the world — has outdone Moore’s-Law and for many of you that know Moore’s-Law, that’s in the 1970’s when the CEO of Intel said we’re going to build a transistor that’s going to correlate to the pace and the range of technology. If you look at that from a physical sense, if you had a 1971 Volkswagen Beetle - if that doubled at the pace of change, just like Moore’s-Law — like technology has, that thing would run 300 thousand miles per hour, it would cost four cents and you could use it for a lifetime with one gallon of gas.
Now, back up to 2007, his contention is after 2007, we have seen a hockey stick and we’re going to continue to see that. If you ping a server halfway around the World, like in Singapore, and it comes back in half the speed of light — I mean we’re starting to get to a point where we’re limiting ourselves on physics. So the question is why? The first answer is the Cloud, the plummeting cost of compute. It’s the plummeting cost of the storage and consequently it’s the plummeting cost of analytic ability to gather, collect and analyze data. Right? So we see a lot of those companies that are able to use data to their advantage.
Google as a verb was stated as the most important word in the dictionary. Fifteen years later, still 90% of the world’s data cannot be Googled today. That means it’s sitting behind a firewall someplace. You have to get access to get to it, maybe in the deep web, but regardless, it’s 90%. So we have got a lot of opportunity and a long way to go. We need to think about how we’re going to access data differently.
The first one is Economies of Scale, low cost-high volume. You know, you pick your Amazon, you pick Walmart, you pick companies like that. That’s a tough place to get in within a company. So there’s network effect, level broad scale, that’s essentially loyal networks like Twitter, like Facebook, maybe then Amazon, maybe eBay — you get more values the more people enter your network, when there’s sales. There are 45,000 publicly trading companies, 100 million companies across the planet that I would venture to say that they all must take advantage of data and that’s how you differentiate yourself moving forward. If you don’t have AI, if you don’t have AML, Machine Learning. You don’t have Machine Learning unless you have analytic and you don’t have analytic unless you have data. I mentioned this to somebody yesterday — I was in San Francisco yesterday — and they said, “Oh, big data is becoming an overused word.” It is, but having said that — more relevant than ever.
Easiest way that I find to assess where a company is at and give them a plan is talk to what I call the maturity perp. Tried a lot of different things and this is the best way. The left two are more your spend money to save money. The right two are spend money to make money. Operations, like I said, it’s really about cost reduction, it’s your ER, back office systems, ERP systems, more client servers still and it might be going into the next quadrant, which is data warehousing. Which is — now you want to get big insights from your operational data.
You have got to transition a lot of data within the operations to data warehousing, like a data link, so that brings ETL into play, it brings business intelligence into play. You know, the opportunity is here, might be thinking about — I’m thinking about what to do, I’m thinking still about Cloud, what I need to do there and modernizing my infrastructure. I want to give sales reports real time. I want to give yields real time, that’s kind of that quadrant.
When you move over into the business side, you go from the IT to the business, you go into self-service analytics and this is where you open analytic and you essentially allow everybody or you empower everybody across the organization to consume the data. Then this gets rise of different personas — like data engineer, data analyst, data scientist, I could go on and on. Typically when you hit around this area you hire a CEO. I’ll be honest with you, a lot of companies I visit have hired the CEO, this has got to be toughest job right now because they hire the CEO and then they say, “All right, you’re responsible for data” and then they’re not going to think about data anymore and they also don’t empower the CEO’s with data scientists or otherwise. They figure — you figure that out and they’re still resistant to change.
It’s a tough job right now. Half the companies that I visit don’t have a CEO, some of them have just adopted a CEO and they got a long way to go. The trouble with the CEO is not only that you have those problems, you know, you’re going to be assessed on how you transform the business immediately, but also how you’re going to transform the business in the future because three years down the line if you’re still with that company, they’re going to say, “What the hell were you doing for three years,” if you don’t have a good strategy in place by then or you put a strategy in place that solves just today’s problems, but not tomorrow’s. They have got a tough job.
Most clients on average, they reside where their Orion dot resides. The good news is that there are a ton of opportunities for everybody in this room, a ton of opportunity. Once their talking about architecture and they start getting more value, going from the left to the right, you start with the data sources, congestion, data light, access, you do your insight, you make it self-service. You have got Governance across the platform, you got security across the platform, you got streams across the platform, that kind of stuff. I think any good business is looking at private Cloud, public Cloud and hybrid.
One, TIGER Data Management is about structured, unstructured — all kinds of different data types where you collect the data. Then there is Unified Governance. You have got to organize the data, there is typically some compliance that you gotta look at. You have got to make sure that it’s secure. Data analytic and visualization, this is where the mock type data or you allow every user — empower every user across the business organization to consume that data.
Machine Learning and Data Science, my view, it should embedded into all of those different elements in Open-Source because Open-Source drives speed and innovation and it should be part of your plan if this isn’t. Holding the five elements — if I can simplify them or over simplify them — that I stress to any client that I visit. How we have chosen to do it, what I call a common analytic engine. By that, I mean any form factor that we have — you can run the same analytical with this on Cloud, Manic Service or otherwise hosted behind the firewall, all the same analytics are run.
So essentially you can have an application that can run as well that will run on anyone of those form factors - whether it’s on the Cloud, whether it’s private, whether it’s an appliance, which is our integrated appliance system, whether it’s deeming too proper behind the firewall or in the Cloud it’s hosted, and compute and store. It can be very hard to sell, even to somebody that’s totally behind the firewall and says, “Hey listen, I’m not moving to the Cloud. The Cloud is free, Open-Source is free, why am I buying this big rack that you want me to purchase behind the firewall?” Now if you go ahead and you’re able to say, “Look, this is open.” They need it because that’s all their going to do, they kind of know that, but their still worried about the future. So listen, you can move this out at your pace. Right? Because you can have a common analytic engine. So if you want to take what’s in that appliance, you want to put it as software only and make it go against your own hardware, do it. You want to put it in the Cloud, do it — still run the same analytic engine. And then finally, Hyper workloads — this kind of goes without saying maybe, but that’s putting operational with analytic and more of an age tap framework that I think everybody should support.
Everything that I’m talking about is designed because we just said that 90% of data can’t be Googled — is designed to put analytic to the data, not data to the analytic. So wherever you want your data, you want to bring the analytics to you.
In 2020 their talking 50 billion connected devices, a $1.3 trillion spend in IOT and 40% of the analytics will be done on the edge. So you have got to have technology that can fit within that framework, that can fit on the edge. We have specifically targeted that with the Informix database. An Informix database, you know, is an interactive database, you can go with Open-Source. The difference between Informix is you get enterprise quality, you don’t have to cut down any features to get it on there whether it is memory or whether it’s storage and you can make decisions on the other devices, port Spark. I think from Informix that was acquired by IBM, from Informix. So this is important to me and nobody can do it better in the IOT, at least as far as I’m concerned.
I also want to talk about the analytic system just because I have to say something about this because we just released this the 29th and it embodies the strategy that we’re trying to drive and the strategy I’m trying to convey to you. This is like an integrated analytic system built for Cloud because like I said, you can take your workloads, you can put all the software, put on your own hardware if you choose to do that and you don’t want it in our hardware, the pre-integrated hardware — this is all built on power, all flash. The performance is unbelievable. It’s got the common SQL engine in it. We have a console in it - administration extremely simple. We have got Machine Learning embedded into the box. We got Spark embedded into the box and we have - what I’ll talk to in a minute — The Data Science Experience where your data science is built into that box and that was one of our core tenets. I’m pretty proud of it right now, but also, like I said — the point is, it embodies our strategy.
Number two is Unified Governance. There is no data if you can’t find it. I’m reading a book right now and another book I would recommend and I don’t know if you’re going to agree with everything in this book, it’s called Sapiens. Over history there have been a ton of full scripts, meaning you have got arithmetic, you have got written scripts that will write poetry, just a full script. Most of them are lost over the course of history because they don’t have cataloging. A lot of civilizations and believe me - they have got to collect taxes somehow so they have got to store a lot of data, right? So big civilization bureaucracies, you have got to have data, they have had data, but they didn’t have a lot of cataloging so a lot of this stuff was lost.
In fact, the civilizations that we know more than anything else about are the ones that did have some of the cataloging, like Sumerians, Pharaonic Egypt, China, Inca’s and some people out there would say India and Sanskrit. It points to the fact that if you don’t have cataloging, you’re just going to be like history, you’re going to be lost. The library does for books what Unified Governance does for data, it’s as simple as that. So you have got to catalog it so you can find it, it’s behind the firewall. It’s GPR, it’s putting compliance in place behind the firewall. It’s eDiscovery, it’s records and retention and then the 2.0 that everybody should be in right now, (unintelligible), you know, goes outside the firewall, goes private Cloud to public Cloud - that’s tough. It seems simple, but now you have got to do security across public Cloud, that kind of thing. I feel like IBM does that pretty well — and that’s Governance 2.0.
Number three was the Data Analytic and Visualization. The best way I can explain this right - you want to be able to visualize your data immediately, simple UI, very simple. You should have a strategy, you should know where you’re going, where you’re at and where you’re going.
Let’s talk a little bit about Machine Learning. The opportunity that we have in the future here is unbelievable with Machine Learning. So what do you use Machine Learning for and why is it so important? Well - let me back up. I do want to say this — it’s not new. It’s not like it just came out yesterday. I think most people won’t think of it that way, but actually, there was an IBMer in 1959 that created the program to play Checkers and incidentally after he did so the stock went up 15%. I could use that again. Actually we had 5% this week — but 15% immediately because it showed what hardware and software could do at the time. In 1996, we beat the World Champion in Chess and then you may have heard of the 2011 Jeopardy. The reason why this is so important now is because going back to our Cloud slide where we talk about the price of compute plummeting and the price of storage plummeting, it’s made it all possible from where it’s done.
Machine Learning has the potential to diagnose skin cancer before progressive to the lymphatic system, right? There are five million cases of skin cancer a year. The number one is lung cancer. We do have dermatologists that are skilled in looking at and identifying cancer. So look, you have got to have that skill, their human — they can be wrong. But at Oregon University where they were diagnosing stomach cancer — so they got all the best doctors they could get together to do a little experiment. They were going to diagnose some stomach cancer and the doctors came up with seven elements that you can characterize and identify as stomach cancer with — size, crater, color, I don’t know — all these different things. They narrowed it down and they all agreed on seven and at the time they thought - kind of in the machine era, I told you a lot of models were created in the 70’s. They went and they said all right, we’re going to create a model, we know it’s not going to be any good, it’s the first time we’ve ever done it. They gave the doctors 96 pictures of stomach ulcers to identify. What they didn’t tell the doctors is they put some of the same pictures in there twice for them to identify.
You gotta keep them honest. Doctors did their deal, they sent that to UCLA. UCLA sends it back — it’s kind of disheartening because not only did the doctors not agree with their other peers, they didn’t agree with themselves. So think about that the next time you go to your doctor and they tell you they know 100%. So the point being is, with something like Machine Learning, Automated Intelligence - this can help in this kind of a situation to help doctors help all of us make good decisions. I mean the interesting thing, even more about that is, that simple model that they created in the 1970’s did better than all those doctors because it was consistent, it was objective and it went off the seven criteria that they had. That was in the 70’s, now we have got data compute.
There are many examples, I’m not going to go through all the examples, but to prove that I’m just not all IBM, there is an interesting - you have got part of this new company, Opendoor, pretty interesting. I don’t think — the last time I checked, they weren’t in our zip code. Their not in all zip codes around the World. Essentially what you can do, you can put in your address if you’re going to sell your home and they will instantly be giving you a bid. And it’s not like a low ball bid. The idea is that — hey, you’re not going to pay for a realtor fee, right? And they’ll do Machine Learning so they’ll give you a valid bid so you’re motivated to sell. I have had friends that have taken this offer. They don’t have to do anything, put it there, get an offer, go. It’s done. That’s Machine Learning. So all you relators, that are realtors on the side or something out there, you’re being disrupted as well. Nobody is untouchable, I promise you. Look, in every field there is going to be disruption that’s going to happen — self-driving cars.
You know, we talk about healthcare, finance - some of this stuff is scary. So what we’re doing is trying to promote Machine Learning and we have created what is called The Data Science Experience. This is an IDE, Integrated Development Environment. Coached in Zeppelin, Jupyter Notebooks, RStudio, dot-com models already out there you can leverage — you can use Gala, you can use Python, whatever you want. Siri or Notebooks across the team, other data scientists. Right now there’s like two million data sites that are out there. You have Filza and Spark, so that’s across all data sites. What I kind of refer to Spark — as the analytic engine and you can get it on private Cloud, the desktop, or the public Cloud. This is our way of creating an environment, it’s simple, a new case environment. Again, embedded in all our products.
The other thing I talked about was Open-Source. We are tripling and quadrupling down on Open-Source. Between us and Hortonworks, we are the top two Apache committers with Apache outlets, Apache Spark and so on — and we are going to continue. This was in January of this year, it could change, but I hope that we stay at the top. So we have raised Open-Source for obvious reasons, it drives speed and innovation.
With that, I have given you my view and my industry strategy on the five elements. I do think all elements — businesses have to get right, if they’re going to be competitive moving forward. They’re not easy, but we make them more complex than they need to be. To me, it’s about more culture change than it is anything else. Once you decide or are motivated, hopefully you’re not motivated by negative impact or marketing events or whatever that gets a company to moving — if you can get ahead of it, it’s not that difficult. You just have got to decide to do it.
Even in IBM, we’re doing it ourselves. We’re trying to disrupt ourselves by getting support, which I’m also responsible for with analytic and we can continue to use Machine Learning in Watson. You can actually chat with a Watson Bot now. The tour guide can see output of Watson, they help them solve issues in surfing our content that’s out there so we can learn from that as well and potentially prevent some of the UV issues by putting that content back out there.
So where do you guys go from here? I’m a firm believer in maturity curve and that’s a great dialogue to have with your business to see where they are. We have got a — what we call a Data First method at IBM that’s doing just that. It’s free for most clients, in fact, we have a free Machine Learning Hub in Silicon Valley that your company can go there for a couple days. We’ll show you how to get started. You can use some of our models or maybe you can walk in, walk out and you can have, you know, a model that you can apply to your business immediately and look like a hero.
We have also got a Spark Technology Center in downtown San Francisco free that you can take advantage of. Being the first - most cases it’s free. We have somebody come in, assess your business, draw out the plan — just like I talked about. What we do is a Discovery Workshop, we start wherever you guys are and it depends on where you are in that curve, like I said, and we do it almost entirely for free. The reason is because we feel like that relationship, that partnership, benefits us just the same, right? Where you can reach me, my Twitter handle is AMartin_v. I’m on LinkedIn just as well. If you consider yourself an expert and want to join me on a podcast sometime — just hit me up and I’ll bring you in. What’s that?
Thank you very much, appreciate it. Thank you.