Making Data Simple: What's next in the world of data & analytics with Seth Dobrin, part 1
What's next in the world of data and analytics in 2018? In part one of Al Martin's discussion with Seth Dobrin, vice president and chief data officer for IBM Analytics, explore the strategies and people your company needs to disrupt and succeed in the year ahead. Do you or your team members need new credentials to work in data? Seth also discusses what you need in your toolkit to be a data scientist at IBM.
01.40 Read "What IBM looks for in a Data Scientist" by Seth Dobrin and Jean-Francois Puget.
06.00 Learn more about GDPR.
13.00 Learn more about master data management.
13.05 Learn more about unified governance and integration.
13.25 Learn more about machine learning.
14.40 Learn more about cognitive computing.
Hungry for more? Check out our previous podcast episodes of Making Data Simple:
- Episode 1: Making Data Simple: Machine Learning for Dummies
- Episode 3: Making Data Simple: What's next in the world of data and analytics with Seth Dobrin, part 2
- Episode 1: Making Data Simple: The big data problem
- Episode 2: Making Data Simple: End of tech companies
- Episode 3: Making Data Simple: A new definition of client care
- Episode 4: Making Data Simple: Will machines take our jobs?
- Episode 5: Making Data Simple: Growth hacking - not just for start ups
- Episode 6: Making Data Simple: From 2D to 3D -- augmented reality data visualization
- Episode 7: Making Data Simple: The 5 areas businesses MUST get right
- Episode 8: Making Data Simple: How data science is helping to improve aviation
- Episode 9: Making Data Simple: Making data fun & easy with Caleb Curry
- Episode 10: Making Data Simple: Data movement at size and scale
- Episode 11: Making Data Simple: Cloud computing, part 1
- Episode 12: Making Data Simple: Cloud computing, part 2
Al Martin: Hey everyone, thank you for listening to the Making Data Simple series. I just want to let you know that this podcast was recorded in 2017 so any reference to end of year means end of year 2017 (unless otherwise stated). Just want to let you know, thanks again.
Hey, welcome to Making Data Simple. Al Martin here again. With me today, I have Seth Dobrin, chief data officer, IBM Analytics. How are you, Seth?
Seth Dobrin: I'm doing well, Al. How are you?
Al Martin: I'm doing absolutely terrific. You know, I have to say that you've been everywhere lately. Ton of events, keynotes, interviews. So I guess to start out, tell us a little bit about yourself and what you've been up to.
Seth Dobrin: Yes. So like Al said, I'm the chief data officer for our Analytics business unit at IBM. I've been here a week — I mean a year and three weeks now. I came from a client.
In my role here at IBM, I'm really responsible for three things. One is our own internal business unit transformation inside IBM. Another is working with our offering - our product management team to build offerings that as a CDO I would want to buy.
And the third is getting out, sharing my experiences with clients. Which is where I spend a lot of time — like you said — on airplanes flying around the world.
Al Martin: Great. Well, you've also done, you know, ton of speaking engagements lately haven't you?
Seth Dobrin: Yes. Yes, I've been speaking at all sorts of places. Doing some blog posts. Just had one come out on VentureBeat which was — I was really excited about. On topics ranging from data science to GDPR to unified governance in general.
Al Martin: Yes, real good. So let's start at CDO if you would. I, you know, I visit a lot of clients. I say this all the time and you know, a lot of big enterprise clients are companies.
And many of them still don't have a CDO. Which is kind of, I don't know somewhat a little bit alarming to me. But then I — that said, I visited an insurance company as of late and they actually had two in the same division. Which I — like they were doubling down.
So I guess my question to you to start. I guess it's a three-part question. Who needs a Chief Data Officer? And what is the role?
Seth Dobrin: All right, so I'll answer them one at a time. And I may need you to remember — remind me what they are after I go through them. So do I see that? Absolutely.
I see companies that are trying to go through a digital transformation and they don't have a CDO. I see lots of companies that have multiple CDOs. I think -- to be completely honest -- if a company wants to become quote unquote digital — whatever that means and we can talk about what that means if you want in my mind -- they need someone that owns data for them.
And if they're really serious about it — and maybe I'm biased — but they will name a CDO. They need someone with that title, with that weight, with that authority and that buy-in from the most senior executives in the company to go down that journey. Did I answer all three of them?
Al Martin: Yes. Maybe. So I guess, you know, what — could you go back to the definition of what you see chief data officer in terms of the role? And I think you've already answered the other two.
Seth Dobrin: Yes. And so definition of a chief data officer. mean, I think chief data officers — in my mind — are really responsible for five things.
They're responsible for the data strategy. So how do we get the company to start thinking of data as an asset in the company and stop thinking of it as a digital dropping of applications. What is your data science strategy? And so how are you going to implement data science in the enterprise?
And that's not just building data science asset machine learning models that are deployed as a CSV file. That's building machine learning models that are then integrated into applications and processes.
What's your cloud strategy? And that's not — and a cloud strategy — a perfectly legitimate cloud strategy could be we're not going to the cloud for the foreseeable future. But it's got to be an intentional decision. An actual decision. It can't just be a decision of an action.
And then what is your governance strategy? So how are you going to govern all of these different environments? So even companies that their cloud strategy is we're not going to go to the cloud. They — a public cloud. They probably have a private cloud.
They probably have multiple data centers they're operating on. Have - they definitely have different types of data — structured, unstructured — how do you govern that all proactively? And then the final piece is talent. How do we get the talent pipeline that we need to be the company of the future? And that's both from new talent as well as how do we reskill the talent we have on the ground?
And in my mind, every CDO cares about those five things. And that's from my travels as well. They — depending on the industry — they may care about them in different priority order. Yes. So that's kind of my view of a CDO.
Al Martin: Yes, but I would imagine that most companies in, you know, segments or pillars within a company view data as an asset. But I would imagine one of the biggest challenges is they don't want to give up that asset. Hence the reason maybe they don't want a CDO to come and tell them what to do.
Seth Dobrin: So I don't know that I would agree with that.
Al Martin: No?
Seth Dobrin: I think most companies still view data not as a strategic asset for the company, right? If they did they would — most companies don't even treat data security as much — as strongly as they treat laptop security or phone security, right?
And so I think companies need to start taking a lot more seriously and really start thinking of it as an actual asset. Just like they do their computers, just like they do their physical assets. Whether it's equipment or cars or financial records. I mean, it's just as important as all those things and companies just don't think of it at that level.
Al Martin: Wow. I guess that certainly explains the breaches that we see on a regular basis then.
Seth Dobrin: Absolutely.
Al Martin: Do you think GDPR is going to come in and put the hammer down? Or do you think, you know, are they going to be able to get in front of that? Or is the hammer going to come down after the fact and we're going to see a bunch of chaos as a result of GDPR?
Seth Dobrin: So I think we need to define what GDPR is, because a lot of your listeners probably don't know what that is, right? So there's a new regulation coming out of Europe called the European General Data Privacy Regulation.
And this is a regulation that is changing what the responsibilities are for companies or anyone who holds data. And it expands the rights of what they call subjects — which is any individual — to control their data and have a say over what companies can and can't do with their data.
And know what the companies, what data companies have about them. And be able to delete data that companies don't need to do business at their request.
It also expands the definition of personal data. So it includes not only things like Social Security number and birth date and address but it includes things like IP address, GPS coordinates. And so I think definitely to your point, Al. It's going to change how companies are thinking about it once they take it seriously.
I think companies in Europe are taking GDPR very, very seriously. It goes into effect — or it starts being enforced — May 25 of 2018.
Al Martin: Do they have enough runway? I mean, like, to your point, you know, if you're a worldwide company — like IBM — you know, we're doing tons and tons of compliance as you might...
Seth Dobrin: No, I think companies in Europe definitely get it. Companies that have a big footprint in Europe definitely get it. I think companies in the US weren't really sure if it was going to apply to them. And, you know, recently there's some guidance that came out from the EU and really solidified that it is going to affect them.
The date's getting closer. And, I guess, listen, right. Companies have tons of regulation they have to deal with already. And many of the larger US companies were trying to get themselves for the regulations that go into effect in December, right, at the end of this year?
And so if - they can only do so much. They only have so much bandwidth. Would I have loved for them to pull their head out of the sand six months ago, twelve months ago? Absolutely. And would I - though I think in GDPR it would have been better idea? Absolutely. But I think, you know, they all have priorities. They have limited resources. And I think GDPR is going to be the next thing on their runway.
And it's really important that they show they're making progress towards being compliant. I don't - you know, I'm not a lawyer. I don't play one on TV, I didn't stay at a Holiday Inn Express last night. I think companies just really need to show progress towards being ready for GDPR on May 25.
Al Martin: May 25, I was just going to ask you that. May 25 is a big day, right?
Seth Dobrin: Yes.
Al Martin: It's going to be like the new tax day for EU. So we'll — yes, might as well dive down just one more question on this. What are the largest changes that you see that companies have to make in response to GDPR?
And again, I go back to do they have enough runway given we're at the end of the year now and you've what, got four months after this?
Seth Dobrin: Five months.
Al Martin: Five months?
Seth Dobrin: Yes. So I think companies need to have complete lineage on all of their data that relates to an individual. That's the biggest change. So you need to know where your data is on all people that you're interacting with.
So Al, you know, as an employee IBM needs to be able to tell you — at your request — what data they have on you. As a customer of IBM, they need to tell you — need to be able to tell you what data they have on you. Most companies don't have complete lineage on individuals that they interact with.
I think another thing is, another aspect of it is it changes consent. The concept of consent. And so, you know, we all have — well we don't all have iPhones — but many of us have iPhones. And, you know, we get these updates you know, and they say do you agree?
And you kind of scroll through the 90 pages and you say yes because you want the new update. That can no longer be in the context of GDPR. GDPR needs to be understandable and it's, you know, consent under GDPR needs to be understandable and explicit.
So our — you are consenting for us to do these things, right? Track your location in order to provide this value to you. And I need — I have the ability to opt out. And so it's a good thing and it's a challenging thing. So for many companies managing consent at that granularity's going to be challenging.
But, fundamentally changes company's relationships with customers — which is really important to becoming digital — is thinking about how your customers do things. How they interact with you, what they want to do, what they think is valuable. GDPR forces that change in relationship with your customers where you need to go to them and you need to say, "Hey Al, I want to understand where you are because I want to provide this value to you," right?
And so now, if I provide some notification to you, you've asked for it essentially. So it allows companies to have this relationship with clients where something that might have been perceived as being creepy before is no longer creepy because I've asked for it. Right? We've had a conversation about it. And so that's a fundamental difference with GDPR. And that's the upside is I think it really makes — it forces that shift in conversation with your clients.
Al Martin: Hey so you've mentioned — and you started off with this — about the definition of becoming digital. And if I wanted more information on that, absolutely I do. Tell me what you see — or what's the definition or whatever, the taxonomy — of becoming digital.
Seth Dobrin: Yes. So I mean fundamentally, becoming digital — you know, from a corporate culture — the most important thing is that everything a company does they do through the eyes of their customer.
So every decision they make, every investment they make they ask the question, "What impact is this going to have on my customer? How is it going to make her life easier? How is going to make her more productive? How is it going to enhance our relationship with her as a customer?"
So that's the first thing. That corporate culture needs to change where that is the fundamental underlying question of every decision. The second thing that needs to happen is you change your business model.
So most companies that are analog — for lack of a better word, I'll say analog companies — have products and they interact with their customers on a kind of once a year, once every three-year basis. Where some of them will buy something and you'll get money from them, but it's not a recurring revenue.
Truly digital companies offer products as a service and offer them as an outcome. And so, for instance you know Al, here at IBM right we've changed our product portfolio in -- or our mindset around our product portfolio -- where we really don't want our sales people showing up at our client saying "Hey, you want to buy an MDM solution?"
We want our sales people showing up at clients and saying, "Hey, we want to help you unify your governance across all these different environments, across all these different data types.” So we are no longer focused on selling products, like master data management, we're focused on selling outcomes like unified governance.
And that is the sign of a digital company. Now, getting there is a journey. It doesn't just happen overnight. And it starts with a data transformation. So really understanding your data, creating data assets, building these assets, understanding; the next step is really your data science transformation.
So this is kind of where you start using machine learning — AI — to drive insights. And then the next step is really where you start building these fundamentally new business models. Now things change at each of those steps. So when you go to the data step and you start having conversations with your business about okay, this is a data journey, we want you to start using your data.
I'm going to, you know, rephrase, paraphrase here. But people you talk to in the businesses are going to say get me the data I need to support my preconceived notions. So that's what they're going to ask for. And that's not ideal. But that's okay, because at least they're starting to use data to make decisions.
When you get into that data science step, that's when they start asking for data to help them make decisions and get insights that they wouldn't have gotten from their preconceived notions. So it's help me get new insights from my data. So that's really the change in your data science transformation step.
And then in the digital nirvana phase, that's where you start using the data and the machine learning models to build new business models to fundamentally change your business.
Al Martin: Well I — the more you talk I think you might actually know your CDO stuff. Hey, so that brings me almost to take a step back relative to everything you just said.
You know, (Ginni Rometty) — you know, our fearless leader — famously said we're entering into the new era of computing, and that's cognitive computing. And that was in 2015. Now we're in 2018, where are we, I guess? Are we where you would have expected us to be?
And I guess I'm looking at, you know, given everything by example you just said where are we going? I mean where — I hear what you're saying. But are we where you think we should be from a company standpoint? Or are clients confused? Do they need more help now than ever?
Seth Dobrin: You know, I think clients know now that basically every company has two choices today. They can be the company that gets — the company in their industry that gets disrupted. Or they can be the company that does the disruption in their industry.
So, you know, to be the disruptor you need to become digital. And you need to build these new business models because that's what the disruption means. And so most companies that I interact with — and literally it's been hundreds in this last year — all around the world are at some place towards the middle of that journey I just described.
And so most companies are either later in the data transformation stage or early in the data science transformation stage. And so are they aware, I think they should be? I think in some industries, probably because it's harder to transform in highly regulated industries like banking and finance.
Other industries, you know, are further along, because inherently they're science-based industries like agriculture, pharmaceuticals, oil and gas. I think they're farther ahead than the pack because they fundamentally value data because they're typically run by you know, people who grew up as scientists.
Many of those people have Ph.D.s and they've been around data and the hard sciences all their lives. And so it's kind of all over the place. But I think it's, you know, I think they're in a good place on the journey.
But right now it's honestly a fight for your life of the company. I mean, IBM was disrupted. So we were not the ones that led disruption in IT. We were disrupted. We're fighting our way back now. We don't want our clients to be in that same position.
Al Martin: So I mean — to your point — when I visit clients I often have to throw up the maturity curve to figure out exactly where they are in their own disruption. Or, you know, where their change is in a — well your — their innovation is by example you know, you figure a four-quadrant chart.
You know, the two quadrants to the left is to spend money, to save money. The two quadrants to the right are the spend money to make money and if you go from left to right you're talking about operations, cost reductions. Then you go to data warehousing, modernization. Then you start going into make money, that's self-service analytics where it's insight driven.
And then you go to the new business models. Is that kind of the same way you define it? And you figure clients are right in the middle, most of them? Is that what you were trying to say earlier when you said you think they're somewhere in the middle?
Seth Dobrin: I wish I had a whiteboard on this podcast. I'm not as smart as you.
Al Martin: Yes, whatever.
Seth Dobrin: And I can't visualize in too many dimensions. So I try and keep it two. So I just have kind of a single line where people progress from, you know, basically nothing in terms of data all the way to being truly digital.
And so by in the middle I mean they're kind of they've done most of the data transformation part. And they've — they're either almost done with that or they're starting to do the data science. So machine learning, AI type of transformation. So that's what I mean by kind of in the middle.
They're on the cusp of really starting to use machine learning to run their business. I'd say there's a small handful of companies — of large, you know, Fortune 1000 companies — that have truly become digital and started developing new business models. And they're all the companies that get talked about in all the, you know, the business journals that have gone through and done this.
Al Martin: So I want to go into machine learning, I want to go into data science. But before I do, do you have any predictions in terms of where we'll — where we will be headed in 2018? And if you had a crystal ball, what do you think the biggest changes are going to be in the field and in big data in 2018?
Seth Dobrin: So I think the biggest change in — that I hope to see by the end of 2018 is — well, there's really two of them. I think one, I think GDPR is going to help companies realize the value of governance. That it can really be an enabler or even an accelerant of change if you do it right.
So I think that's one thing. So you'll see companies start embracing governance as a way to fundamentally change how their business operates in a good way. And not in a "We're going to stop everything from happening" kind of way which is usually how people think about governance. So that's one thing.
And I think the other thing is — over the course of 2018 — companies are going to really start figuring out how to implement data science in the enterprise. So there's a — there is a big difference between data scientists doing cattle competitions and being successful at those and implementing those types of models in an enterprise. And companies are going to start doing that.
And in fact, we've formed a team.I have a team of top, some of the top data scientists in the world that their job is to sit down with our clients and help them figure out how to implement machine learning and AI into their enterprises.
Al Martin: You know, it's funny that you mention you have a team. Because...
Seth Dobrin: I had to plug them.
Al Martin: What's that?
Seth Dobrin: I had to plug them.
Al Martin: Well no, good for you. And in — you know, I have — this is kind of a rhetorical question — but should all data — chief data officers — have a team? I go to — you know, the companies that do have them, sometimes they're lone wolves that I see out there. And I wonder how effective that they're actually going to be.
Seth Dobrin: You know, I really think it depends on the personality. So I mean, you know Al, I mean I've been here a year. I've just in the last few months got a team and that was intentional. When I started with IBM you know, Rob (Thomas) asked me, "What do you need for a team?" and I said I've got to figure out what the hell my job is first.
Because, you know, and what the scope is and what we need. What IBM needs. And so I think, you know, being a chief data officer is about — in my mind — is really comes down to — it's about driving change in the enterprise. And someone who's going to be good at that can do it with a team or do it without a team. You can build networks.
I mean, look how much I use your team to do changes over the last 12 months. They don't work for me, they have no accountability for me. But I convinced you that it was valuable and I convinced them that there was value in it for them. And that's really what you need. All the chief data officers that I've been — seen that have been successful can do it with or without a team.
Al Martin: Great. And you have had a huge influence on our team. Just for the listeners out there, we've got a team that we're doing kind of reactive metrics if you will. Seth comes in and really works with these folks. And I think you turned them into data engineers one way or another. Data scientists maybe even.
I want to get your thought on that because now they are leveraging models and algorithms to present or prevent or prescribe any churn that we may or could have. And we get with our client before that happens. I think that you've done some terrific work there. But that begs the question, where - can you do that? I mean what kind of education do you have to be - to drive data science?
I'm kind of going out of order here, but you went into that and I'm kind of curious. Is that possible? Can you turn existing teams over (to that stuff)? Do they got to get formal education? What are your thoughts there?
Seth Dobrin: So actually, so (Jean Francois Peugeot) and I just did a blog that was picked up by VentureBeat on what it takes for you to be a data scientist. We kind of lay out in there what we see as requirements.
And one of the requirements is that you have some kind of formal training in a hard science. And that's — but that's not a hardened — that's the only one that's not a hard and fast requirement in our mind.
The reason that I think it's important to have some kind of hard science as background is going through that process teaches you about the scientific method and how to implement it. Because fundamentally what data science is it's about applying the scientific method to solve business problems.
If you don't understand that basic scientific method, you're not going to do data science. You're going to create models that may or may not solve a problem because you're not testing a hypothesis. You don't know what you're doing, you're going all over the place.
And that was really the conversation that helped your team move forward, was me kind of guiding them and saying what's the hypothesis you're trying to test? What is the outcome your looking for? And can we prove that we're getting the right outcome? And so that's an important piece is to have that.
And so you don't need that training, it helps. You certainly need people who — in your organization — who have had that training. People can pick that up through trial and error And they can understand it with guidance of how to do that. But it takes a lot of oversight.
So actually myself and (Peugeot) sat down with your team on a regular basis and kind of kept them in those areas of are you answering the fundamental question? Are you addressing this hypothesis that you had? And that was helpful for them.
Everything else you can learn. You can learn how to do machine learning. You can learn how to do, you know, all of these other things. But it's a little bit of a whole mind shift, mindset shift to — if you're not trained in the hard sciences — to use that kind of scientific method of going through that.
Al Martin: Well I believe you. Because — as for my team — they've done amazingly under your guidance. And the accuracy in the models that they have is increasing and it's very strong right now. It's amazing what we can predict.
Hey, but keeping on data science. If you have a CDO must you have data science? I mean, is it one in the same? Do you see it one in the same? And are companies adopting data science just the way they are CDOs?
Seth Dobrin: I — well, I'm going to answer the question I want you to ask, which is...
Al Martin: OK.
Seth Dobrin: ...should CDOs be responsible for data science in the organization?
Al Martin: Fair enough.
Seth Dobrin: My answer to that is I wouldn't take the job if they weren't. Because I think it's hard to realize value from data without applying advanced analytics to it.
You can get a little bit of value from data but you're really going to see the big-ticket, the big-dollar items for the company show up once you start applying things like machine learning, operations research, decision optimization to that. Because that's when you start getting to running your business differently using data.
And also that's the fun stuff. And so I wouldn't want to take a job without it. Now, there are plenty of CDOs that are responsible more from a governance perspective. And I see those mostly in banking and insurance companies where their job is making sure that the company — that the bank or the insurance company — is compliant with regulations.
Now that's a huge job in those heavily regulated industries. So that's a perfectly legitimate way to set up the organization. And they typically do not have responsibility for the data science in those enterprises.
Al Martin: Hey, nice answer to your own question. Very nice.
Seth Dobrin: Yes, I think...
Al Martin: You could do both.
Seth Dobrin: I tossed myself a softball.
Al Martin: Yes, that's awesome. Hey, but speaking of, I don't think you went through this very clearly, but I think it was important. You did a blog on the six steps from zero to data science for the enterprise. I think now would be a good time, can you go over those six steps real quick? If you...
Seth Dobrin: Yes.
Al Martin: ...can remember them?
Seth Dobrin: So you know, I think the first — well, the important piece is this is really about how do you build your data strategy? And I think, I think the important part is that you start with understanding what your data assets are.
So just looking at — around the company — what are the data assets that we care about as an enterprise? And in my experience, you know, that usually revolves around a small handful of things. And you should keep the number small because you really want to rally the whole company around them.
Should be no more than six in my mind. But every company has at least three in common. Every company cares about their customers. So you need to build a customer 360 — and we'll tack 360 on it because it's trendy. So every company needs a customer 360 because they all care about their customer.
Every company has some form of product. So they need a product 360. And every company has employees. Or contractors. So they need a talent 360. So at a bare minimum, you need those three. And then I kind of sweep everything else into what I call a company 360, which is everything that doesn't fall into one of those buckets.
Now if you're a, you know, an oil and gas company or an agriculture company or a trucking company you need a location 360, so you have some geospatial context for everything. And if you're a heavy manufacturing company you probably need some kind of event 360 where you're keeping track of what's going on on your machinery.
And so you can kind of see the number is very finite. And so you take these and you map them out conceptually. And you say here's where all the data lives that would make up these assets. And you don't actually do any work at this step. This is just a conceptual exercise. And then, absent a use case, so absent an actual decision that a company makes, you don't really do any work around these 360 assets up front.
With the exception of your customers. Because everyone needs to understand their customer. Everyone needs a single version of a customer. And most enterprises today don't have a single version of a customer. Most enterprises have anywhere from five to fifty versions of a customer, unless they've built out this customer 360 asset.
And it's hard to do anything of value if you don't understand your customer, because again — you recall back what I said digital means — the fundamental basis of it is every decision is made through the eye of your customer. If you don't know who your customer is, you can't answer that question. And so that's step one.
Al Martin: Hey folks, as usual, I had more questions than time allowed for, so we split this podcast into two, so part two will be next week. So stay tuned and we’ll be right back at you.