Making Data Simple: What's next in the world of data and analytics with Seth Dobrin, part 2
In this episode of the Making Data Simple Podcast, Seth Dobrin, vice president and chief data officer for IBM Analytics, and Al Martin continue their conversation about data in 2018. Find out the six steps to make your enterprise data driven, how machine learning and AI will impact your business and the top tools to use this year. Tell us what you see happening in the world of data in 2018 on Twitter @AskIBMAnalytics.
05.20 Learn more about Machine Learning.
05.30 Connect with Bill Gates on Twitter,
05.30 Learn more about current robots and Bill Gates plan to make them pay taxes here.
06.20 Learn more about Ginni Rometty here.
07.10 Learn more about Watson Health.
07.30 Connect with Dan Kirsh on Twitter .
07.30 Listen to the Making Data Simple Podcast: Machine Learning for Dummies with Judith Hurwitz and Daniel Kirsh.
10.40 Learn more about GPUs.
11.00 Learn more about the IBM Data Science Experience here.
11.10 Connect with Jean Francois Puget on Twitter.
11.10 Read the What IBM looks for in a data scientist on VentureBeat by Jean Francois Puget and Seth Dobrin.
11.15 Learn more about coding in R and coding in Scala.
11.25 Learn more about IBMs SPSS.
11.55 Learn more about Apache Zeppelin.
12.00 Learn more about Apache Spark.
12.10 Learn more about DSX Data Science Experience.
12.50 Learn more about Apache Hadoop here.
13.00 Learn more about Spark SQL here.
16.30 Find Sapiens: A Brief History of Humankind by Yuval Noah Harari.
17.20 Find Partially Derivative the Podcast.
18.50 Find a16z the Podcast.
Check out part 1 of the discussion with Seth here.
Ready to dig deeper? Check out our previous podcast episodes of Making Data Simple:
- Episode 1: Making Data Simple: Machine Learning for Dummies
- Episode 2: Making Data Simple: What's next in the world of data & analytics with Seth Dobrin, part 1
- Episode 1: Making Data Simple: The big data problem
- Episode 2: Making Data Simple: End of tech companies
- Episode 3: Making Data Simple: A new definition of client care
- Episode 4: Making Data Simple: Will machines take our jobs?
- Episode 5: Making Data Simple: Growth hacking - not just for start ups
- Episode 6: Making Data Simple: From 2D to 3D -- augmented reality data visualization
- Episode 7: Making Data Simple: The 5 areas businesses MUST get right
- Episode 8: Making Data Simple: How data science is helping to improve aviation
- Episode 9: Making Data Simple: Making data fun & easy with Caleb Curry
- Episode 10: Making Data Simple: Data movement at size and scale
- Episode 11: Making Data Simple: Cloud computing, part 1
- Episode 12: Making Data Simple: Cloud computing, part 2
Al Martin Hey, folks, welcome to Making Data Simple. This is Al Martin speaking. This week we have a continuation from the previous week, so enjoy.
Yes, that's awesome. Hey, but speaking of, I don't think you went through this very clearly, but I think it was important. You did a blog on the six steps from zero to data science for the enterprise. I think now would be a good time. Can you go over those six steps real quick? If you...
Seth Dobrin: Yes.
Al Martin: ...can remember them?
Seth Dobrin: So you know, I think the first — well, the important piece is this is really about how do you build your data strategy? And I think, I think the important part is that you start with understanding what your data assets are.
So just looking at — around the company — what are the data assets that we care about as an enterprise? And in my experience, you know, that usually revolves around a small handful of things. And you should keep the number small because you really want to rally the whole company around them.
Should be no more than six in my mind. But every company has at least three in common. Every company cares about their customers. So you need to build a customer 360 — and we'll tack 360 on it because it's trendy. So every company needs a customer 360 because they all care about their customer.
Every company has some form of product. So they need a product 360. And every company has employees. Or contractors. So they need a talent 360. So at a bare minimum, you need those three. And then I kind of sweep everything else into what I call a company 360, which is everything that doesn't fall into one of those buckets.
Now if you're a, you know, an oil and gas company or an agriculture company or a trucking company you need a location 360, so you have some geospatial context for everything. And if you're a heavy manufacturing company you probably need some kind of event 360 where you're keeping track of what's going on on your machinery.
And so you can kind of see the number is very finite. And so you take these and you map them out conceptually. And you say, here's where all the data lives that would make up these assets. And you don't actually do any work at this step. This is just a conceptual exercise. And then, absent a use case, so absent an actual decision that a company makes, you don't really do any work around these 360 assets up front.
With the exception of your customers. Because everyone needs to understand their customer. Everyone needs a single version of a customer. And most enterprises today don't have a single version of a customer. Most enterprises have anywhere from five to fifty versions of a customer, unless they've built out this customer 360 asset.
And it's hard to do anything of value if you don't understand your customer, because again — you recall back what I said digital means — the fundamental basis of it is every decision is made through the eye of your customer. If you don't know who your customer is, you can't answer that question. And so that's step one.
The next step is to map out the decisions that your company makes. So what's the — what decisions do we make as a company? Decisions being...and when you go around you start having these conversations with your business partners they're going to say again, give me the data I need to support my preconceived notion. Build me this dashboard.
That's not a decision. That's providing data to support their preconceived notions. We're really talking about okay, what decisions are you trying make Al? So when we talked about the customer churn project. You could have said, "Show me all the data that I — that we need to look at how many — what customers are churning."
That's not the question you asked. The question you asked was "Help me predict which customers are going to churn so I can stop churn." So you were looking for a specific thing that you could have your team do to reduce churn.
Al Martin: Sure.
Seth Dobrin: So that's a decision. So we want to reduce churn. So think of that as a decision. So you build out your data assets around that use case. In fact, your team has helped us do that. IBM as a company working through that use case.
So they helped hone the customer 360 asset because you need to look at your customers for churn. They've also helped to build out the framework of IBM's product 360 because they were doing it in the context of specific problems. And so we didn't just go and build a product 360. We built out the beginning of the product 360 around this churn model that your team was building.
So that's — step two is mapping out the decision. Step three is what I just said, is kind of going through and starting to act on those decisions. I forgot what the other ones are. I should know. I should have them down pat. So...
Al Martin: That's all right, I understand. I write stuff and then I can't remember even...
Seth Dobrin: Yes.
Al Martin: People will know now that we're, you know, we're just having a conversation. We don't have a lot of stuff in front of us.
Seth Dobrin: Yes.
Al Martin: I don't have it in front of me either, so that's fine. Hey, on a previous podcast I talked to this — and it kind of sounds like what you're saying here too — what I do is I find that clients not always are defining the right problem before implementing the solution.
Seth Dobrin: Yes.
Al Martin: Do you see that as well?
Seth Dobrin: I — well I see, you know, it goes back to these boil the ocean-type solutions. So we want to build a data lake, and what you usually see when you have customer that says we want to build a data lake is they build a dumping ground for their data. They don't know how they're going to use it.
And so they just put stuff in there. And it's that Field of Dreams type thing. "If we build it, they will come." But there's not really good definition of what you're going to use it for when you put it in there. So it kind of just gets lost and put in there for no apparent reason other than maybe we want to get out of this other thing that's more expensive. So let's put it in here because it's cheaper.
Al Martin: Hey, let me dive into a few technology questions if you will. So machine learning real quick. So we're going to lose all our jobs. That's what — Bill Gates has said the robots are going to have to actually have to pay taxes at this point in time. True?
Seth Dobrin: No. I think that's false. Are these just yes no answers or can I elaborate?
Al Martin: Yes. I don't know, say whatever you want. We're just having fun at this point. Well there's actually people out there that believe that that's actually what's going to happen, by the way.
Seth Dobrin: Well and I think that's a misnomer. But I think it's also a misnomer to think that no jobs are going to be impacted. It's just like, go back to the Industrial Revolution. People built automation. And it wasn't just to automate physical building of processes. Of - the building of physical things was the Industrial Revolution, They streamlined it, they automated it, they mechanized it.
We're going through a similar journey now. And if you look back to the, you know, late 1800s early 1900s — when we had this Industrial Revolution — people were out of work for the old types of jobs. And this gets back to (Ginni Rometty) new-collar jobs.
It's a matter of re-skilling people so they have the tools to survive in this new world and to work in this new world. And so absolutely. Jobs are going to shift from one sector to another. But the — I think the overall job market is probably going to grow because of machine learning.
We're going to create new types of jobs. The work that humans do are going to be more value added. And there is some instances where you apply machine learning and you just automate the heck out of a process. But most — in most cases it's kind of human in the loop. Where the machine learning model cuts down the hundreds or thousands of millions of decisions to something a human brain can actually conceive and work with.
And so that's really what it is. Now, I always use the example of a doctor. God forbid I have cancer, and I show up to my doctor and she uses the Watson Health stuff. And the — I don't want Watson Health telling my doctor, "This is what you do"
I don't want them to — Watson Health prescribing what should be done to me. I want them giving my doctor, "Here's the 10 things that can help Seth," and her and I sitting down and having a conversation about it and deciding what's right. So there's definitely a lot of instances where you want that human in a loop.
Al Martin: So augmented...
Seth Dobrin: That's...
Al Martin: ...intelligence is what you're describing.
Seth Dobrin: Augmented intelligence. I'm more specific about that. I like to specify human in the loop so people don't think we're completely taking humans out of the loop.
Al Martin: In a previous podcast, I had guests — (Judith Herwitz) and (Dan Kirsch) — that were from (Herwitz) and Associates. They had written — or have written a book called Machine Learning for Dummies — so it's perfect for me — but we had some of these discussions.
And I won't repeat them on the health care because I'm very excited on what machine learning can do for health care. I had an example from a book that I had read. But I do...I want to ask you a — I don't want to relive that podcast for the listeners — but I do want to...I don't think you can ask this enough.
Just to make sure there's clarity. What is your definition and difference between AI, deep learning, machine learning, cognitive? Because everybody uses them interchangeably. And again, as I said in a previous podcast, I do as well sometimes. So I confuse myself. But what is your definition between them?
Seth Dobrin: They're like those Russian dolls.
Al Martin: Yes.
Seth Dobrin: And so, you know, AI and cognitive are — in my mind — pretty much the same thing. I think cognitive extends AI a little bit. Where, you know, it even gets to the concept of — at least IBM's definition of — I should be able to walk up to a black curtain right and I should be able to ask a question in my natural language and get an answer back.
In my mind, that's cognitive and the machine learns from it over time. And so that's cognitive. AI is just the ability for machines to learn. So artificial intelligence. So that's the whole big field. That includes machine learning. So machine learning is a way to execute AI, so that's a subset of it.
And typically, most machine learning models that we think of — that our customers think of — are what we called supervised machine learning models. And those are where you have data that's already been labeled. So an expert has already said this is what the data says. And then you have the next layer down which is deep learning.
And deep learning is a type of machine learning. But deep learning is a specific type of machine learning like neural networks and things like that. And so that's why there's confusion, is because deep learning is a subset of machine learning, which is a subset of AI. And so it's kind of like — that's why I said it's like Russian nesting dolls.
Al Martin: No, that's a good analogy. So if ML's a way of implementing AI, what other methods can you implement AI with?
Seth Dobrin: Well, so you know, I lump — even though it's probably not really AI — I lump something called decision optimization and operations research into artificial intelligence.
These are mathematical representations of thought processes that allow you to more efficiently run them. And then decision optimization is when you start thinking about how do I create a next best action? So you have machine learning models, what do I do with that machine learning model?
Do I make a prediction and let someone act on it? Or do I make that prescription and say here's the next best action, so here's what you should do with that prescription. And so that's a type of AI as well.
Al Martin: Okay. Switching a little bit. So you talk about supervised learning. Does the magic happen in unsupervised learning?
Seth Dobrin: Magic can happen in either. Doing supervised learning is just more expensive because you have to have labeled data. It's more expensive up front because you have to have labeled data.
Doing unsupervised learning typically takes a lot more compute power. And that's why we get into things like GPUs and other types of accelerated hardware, because you need more compute to do unsupervised learning. Because it's figuring it out on its own.
Al Martin: What tools are best to leverage machine learning? Or I mean, drive data science?
And I know that, you know, we could have a shameless plug here for the data science experience that we drive — and you're welcome to elaborate on that — but what tools should the industry seriously consider and which ones are some of your favorites?
Seth Dobrin: Yes. So, you know, in the blog we — (Jean Francois) and I — put up on VentureBeat, you know, we kind of defined a true data scientist as someone who does hand coding. So they use R or Python or Scala to...
Al Martin: Yes.
Seth Dobrin: ...write physical code. And I think, so that's one way to do it. And I think that's the real most robust way to do it.
There are tools. You know, there's IBM's SPSS. There's competitor products for that that allow you to do visual modeling. So kind of drag and drop — we call them clickers. And that's valuable. But I think it's a little bit dangerous to do data science if you don't understand underlying fundamental math.
And so you shouldn't be implementing machine learning models if you don't understand — or have someone around you that understands — the fundamental math. Because you could inadvertently make mistakes that will lead you down a bad path as a company. And so there's clicking tools and there's coding tools. And so, and I think that boils down to it.
Al Martin: So, you know, we've got our own Data Science Experience that is built in Zeppelin — on the Jupyter Notebooks — that, you know, you can use Python, R, and Scala. But it has built in Spark. So Spark, Spark, Spark you hear a lot. Why is Apache Spark so important?
Seth Dobrin: Yes. So yes. So DSX, Data Science Experience, has like you said, Jupyter Notebooks, Zeppelin Notebooks which are kind of ways to explain what the value are of those are a little bit.
So those are ways to document your code and run your code in line in a notebook type environment such that people can understand what you did. So it gets back to applying the scientific method. Part of that is being able to root — reproduce what you did, and notebooks are a good way to enable you to do that.
But your question was why Spark? So Spark is a way to help accelerate analytics. So one of the reason that everyone went to Hadoop was because of this whole concept of Map Reduce. Which really allows you to do some basic analytics in a much more performant way. But there were some limitations to it. And Spark helps to address those limitations.
And Spark has been used as an interface not just for Hadoop but for just about every data technology out there. And in fact it's been -- you know, with the advent of something called Spark SQL -- it allows users to interact with a wide variety of databases -- or data, you know, data stores -- through a language that they're used to that's been around for decades.
And Spark has the ability to apply machine learning to those requests so that you can do things like query data, and apply machine learning to do some kind of transformation while you're pulling it out. And so Spark is a very valuable tool for doing that. It works across everything like I said.
It's one of the advantages of using open source tools is they're, you know, people are going to hodge — not hodge podge them — but people are going to use them for a wide array of things. And because it's open source they can build it and publish it however they want. And so it's really a great tool to interact with and do analytics on just about any data store.
Al Martin: You know — kind of going where we started — I just now realized why you are everywhere. Because it seems like you're doing everything. We've done — we've talked to data management. We've talked to governance. We've talked to data science. Is there anything you don't do?
We haven't talked to visualization, I guess. We talked to machine learning. We didn't talk to open source I guess, but are those all under your umbrella? Is there any limit that says, you know, this is what I don't do?
Seth Dobrin: In terms of those things you named? I mean, I do. my team and I do all those things. I talk to clients about all those things. And so my remit at IBM is pretty broad. Which makes my job super fun. Yes.
So what don't I do? I don't know, I don't actually physically do a lot of these things anymore, I just talk about them. And have the team doing it so it gets done well. Which gives me the bandwidth and the ability to talk about it and be involved in all of these broad things, having a great team that executes on all these things for me.
Al Martin: You are the industry expert. We need an industry expert, so that's all good. So, you know, as we're kind of wrapping up here, what should every client be thinking right now?
With respect to data, analytics, cloud — we didn't talk too much about a cloud — but what's your advice? If you're sitting — if you're listening to this podcast right now, what — if you're able to oversummarize, what should I be thinking right now?
Seth Dobrin: So I think the two things that I try — when I talk to clients — that I try and hammer on and get them to remember is one, we need to do everything through a use case. Everything we do in data or data science should be done around a use case.
A decision that we're making as a company with the exception of building out your customer assets. So that's number one. The second thing I'd say is that, you know, as IT professionals and data professionals, people talk about legacy as if it's a bad word.
And I think it's important to remember that these things that we talk badly about as legacy and say we want to get rid of are the things that got us here over the last 10, 20, 30, 40 years. And those are the things that earn us the money to give us the freedom to do all this cool things, cool stuff that we want to do with open source and cloud and data science.
And so it's not about how do we get rid of the legacy. And it gets back to my having a cloud strategy. And I encourage them not to talk about it as legacy anymore, but talk about it as business critical assets. And the conversation really needs to be, how do I connect these business critical assets to my future state of cloud, whatever that is?
And so, it's less a bad word conversation and more a moving forward conversation, because that's really what it is.
Al Martin: You are a smart dude, Seth Dobrin.
Al Martin: Look, as we're wrapping up here I'd love to have you back, because I could keep going on for questions. And I know I've been a little all over the place, but I think this has been fabulous. But before I finish I want to ask a few questions...
Seth Dobrin: Well I want to ask you a few questions, Al.
Al Martin: Oh boy.
Seth Dobrin: So what's the book that you're reading right now that you want to tell us about that everyone else should read? I've never listened to this podcast before as you can tell.
Al Martin: Yes. The book that I'm reading is Homo Sapiens.
Seth Dobrin: Yes? What's it about?
Al Martin: It is — look how he turns the tables. Should I allow this? I don't think I should allow this. It's about homo sapiens. It's about us. It's a great book, by the way. I would highly recommend it.
It starts with hunter gatherers and it talks to where we are today and the prediction of the future. And I can't pronounce the author's name, so I apologize, but you can put it in the show notes. It's a fabulous book. It's almost a little bit scary.
But I'm not even going to ask you about a book. I'll ask you this, though. Where do you get all your information about what's happening in data analytics and machine learning? Because I can't stay up on current events. Man, it's just too...
Seth Dobrin: Yes, so some of it is from podcasts. I mean I listen to this podcast. I listen to a — there's a bunch of other good podcasts out there. My favorite one actually went off the air, which was "Partially Derivative." I wish those guys would bring it back. But that was a good one.
And so there's a ton of good podcasts out there that give a lot of good basic information. I do a lot of reading of papers and blogs. And a lot of it is through people. I mean, the people - the guys on my team, the guys in, you know, the men and women on my team that are doing the physical work. Talking to them and understanding it. Talking to clients.
Most of what I know about, you know, I'm not an IT person by trade. Most of what I learned was through people on my team that I just sat down and said, "I don't know what I'm doing, help me understand" and they would help me understand.
Al Martin: I'm with you on that. You know, one thing that — I read a lot of blogs. I read a lot of papers, Wall Street Journal on down. And one thing we've done within our group is get rid of PowerPoint. Which, you know, at first, when you're — you know, we — I think all companies tend to overuse PowerPoint.
But what that has done is all the concepts that we talk about today is put it into a document that may get, you know, it puts you at a higher value position or a higher value content I should say. By which you've got to read it, really understand it. And I think it's been a great change for our organization.
But before you ask, "A 16 D" is the one — that's the (Anderson Horowitz) podcast I listen to. I would highly recommend that. Malcolm Gladwell's got a good podcast out there, I would recommend that as well.
Hey, but you're a successful guy. Got to ask you this one. I find that successful people have practices, a cadence, something they do every day. You know, what is your habits that make you successful?
Seth Dobrin: Well, I think the first thing is I wake up and do yoga every day. Or most every day.
Al Martin: Wow, okay.
Seth Dobrin: Hard when you're traveling. So that's important. You know, I try and maintain my work-life balance effectively. You know, I try not to be on email before 7:30 or 8 and I try and be done working by 6. When I'm home — I travel a lot — so when I'm home I want to be with my family. And so that's important.
And you know, I only do a job that I enjoy. And I'll do a job — I'll work for a company as long as I enjoy it. And so I think that's really important is hat on. Work should be fun. You know, I like, you know, being able to go to work and hang out with my friends. Like you. Where I have a good time, we like it.
We don't just act like this on the podcast. This is how we act when we talk. Same way with all of our other people we work with. And so I think that's really important is to enjoy the people you work with.
Al Martin: So to put you on the spot you having fun?
Seth Dobrin: I am. I am having a lot of fun.
Al Martin: Hey, last question. What do you do for fun? Outside of work. I — and you be with your family, I heard that. Anything else you do? You do yoga, I don't know if that's for fun...
Seth Dobrin: That is for fun.
Al Martin: Is it?
Seth Dobrin: Yes. Yes, gardening, yoga. And hanging out with my family.
Al Martin: Awesome. All right man, this has been...
Seth Dobrin: I'm a boring guy.
Al Martin: Oh those are all good things. I like it. Family is important. Hey, this has been fabulous. You've been great. You are a smart guy, I appreciate it. I hope you come back. Hey, and keep listening to the podcast. Thank you so much.
Seth Dobrin: I will. It's a great — it's a fun podcast, so. And it's good to hear people I know talk on the podcast.
Al Martin: Great. Many thanks. We'll keep it up then. Thank you.
Seth Dobrin: All right, thanks Al.
Al Martin: See you then.