#HadoopNext: Accelerate data science with Hadoop


Every organization sees Hadoop as providing an open source, rapidly-evolving platform that is capable of collecting and economically storing a very large corpus of highly variable types of data and making it available. Yet, most organizations are not fully realizing the value of Hadoop due to the complexity of scaling the Hadoop environment, as well as a lack of skilled data scientists and developers available to extract valuable insights.
Watch the live recorded interview with VP of IBM Product Development, Rob Thomas and GM for IBM Analytics Platform, Beth Smith.

To learn more


0:04live from the Fairmont Hotel 0:06in San Jose Calif kids cue 0:09at Big Data sp 2050 0:12up 0:18did 0:22it 0:29over welcome back this is the cuba flagship program we go out the events 0:32this reclusive North I'm John very we're here at IBM 0:35talk about Big Data big data analytics and they were doing a first-ever 0:39crouch that's i'm ok caster live feed with IBM so 0:44guys gonna try this out is a good crowd chat dotnet slash 0:48Hadoop next and join the conversation and 0:52our guests here Rob Thomas vice president product development big data 0:55analyst at IBM and beth smith general manager 0:57I V and analytics platform guys welcome to welcome to queue 1:01thank you come back and so I be a mostly we're super excited to 1:04next week is out interconnect your because I got smashed up 1:07three shows for the mega shows and and harassments places cuz them from 1:12bosnia's I'm really excited about 1:14you know harassment and all the activity the social lounge in and whatnot but 1:17we've been following you guys the transformation IBM 1:20is really impressive you guys something like he the press in terms of 1:24some other performance eyes in the business but it's pumping right now you 1:27guys seem 1:28great positioning the stories are or any other huge customer base 1:31you services so with Big Data world which is 1:34tends to be start-up driven from the past few years over the past phase one 1:39the big companies came and and started saying hey you know this big market our 1:42customer see 1:43demanded that so your take on as we coming in to interconnect 1:47next next week what is perspective 1:50a big deal as the Watson as garnered headlines from 1:53powering toys to jeopardy 1:57to solving huge world problems that's a big data probably guys are not new to be 2:01days so when you look at this big data week here 2:04in Silicon Valley was to take sure 2:07so I'll start off in that bath United so our big focus is how we start to bring 2:12data to the masses and we start to think in terms of personas 2:16data science plays into increasingly important role 2:20around Big Data how people are accessing that the developer community 2:24and then obviously the wider business community which is the 2:27the clients that I been serving for years but announces that we made this 2:31week around 2:32duper really focused on the first two personas in terms of 2:36they decide just how they start to get better value outta 2:39Hadoop leveraging different tools will talk about some of those are 2:43and to really start to change it about do puzzle to me about insight 2:46it's not about infrastructure infrastructures interesting but what 2:49about what you're getting out a bit 2:50that's where we're approaching it that way any child well it has natural 2:54Italian strategy 2:55around eight o'clock at an engagement and eight is really about 2:59using the insights which a light Rob said it's about the value can get from 3:03the data 3:03and how that can be used in the transform professions and industries 3:08and I think when we bring it back to big data on the topic overdub 3:12I think frankly it has gotten to a point that clients are really 3:17beginning to say it's time to scale their see in the value in the technology 3:21what he can bring how it gives them some diversity in their data analytics 3:25platform 3:26and they're ready to announce scale arm there were close as a partner 3:29so the theme is Hadoop next okay so that 3:32Texas right for the next point which ok what's next is a phase 1 3:36okay we get some base position validation okay this new 3:39environments customers are 1 so what so what is next to me we're hearing 3:43things like in memories ha the spark is proving it is action 3:46in memory that that kinda says OK analytics at the speed businesses on 3:50that support you guys 3:51are all over that and we've heard some things for you guys so so what's how do 3:55we get to the next part we take 3:57Hadoop as a as a infrastructure opportunity and put into practice 4:01for solutions at what what are the key things that you guys see happening 4:05that must happen for the large customers to be successful 4:08so I think that actually ties into the announcements we made this week 4:11Open Data Platform on because that's about getting that core platform 4:16to ensure that their standardization around it there's interoperability a 4:20rounded 4:21and investor base and that vendors and clients are coming together to do that 4:26into 4:27really enable and facilitate the community to be able to standardize 4:32around 4:32then it's about the value on top around it 4:36center it's about the workload and what could be brought to bear 4:40to extend up that how do you apply it to real time streaming 4:44how do you add things like machine 4:46how do you deal with things like text analytics 4:49me we have our we have a client situation where 4:53arm the client took for 4:56billion tweets and were able to analyze that 5:00to a den of our over a hundred and ten million 5:05profiles individuals and then by 5:08integrating and analyzing that data with the clock internal data sources about 5:14seven or eight different data sources 5:16they were able to narrow into 1.7 million profiles that matched at least 5:22ninety per cent per session you know now they've got data that they can apply 5:27on buying patterns it's about that it's about going up the stack 5:30weekend for our mines privacy 5:34cream personas relevant data but personalization 5:37mean collective intelligence has been a high-concept we try not to be creepy 5:41you hip me cool but now 5:45so that brings us to the next level you guys Tom a cognitive on were you guys 5:48also system systems wreckers alter mister 5:53your data warehousing resources yes 5:57data but now it seems engagement real time 6:00in the moment immersive experience which is actually the social and/or 6:04mobile experience what does that mean how you guys get there how do you make 6:08it so it's 6:08better for the users more secure or I mean these are hot-button issues that 6:12kinda leaders right to that point so 6:15particular Cowboys have servers or your first question on Hadoop next 6:19so the duke was no longer just an IT discussion that's what I've seen change 6:23dramatically in the last 6:24six months I was with the CEO for the world's largest banks 6:28just three years ago and the CEO's asking about 6:32Hadoop so there's a great interest in this topic Ind 6:36so so why so why would a CEO even care I think one is 6:40people are starting to understand the use cases play so 6:43that talks about entity extraction so how you start to look at 6:46customer records that you have arm internally 6:50in your system the record your point john and then you you know how do you 6:53match that against what's happening in the social world which is more 6:57or the engagement peace 6:58so there's a clear use case around that the changes how clients 7:02you know work whether with their customers so so that's one reason 7:06second is huge momentum in this idea of a logical data warehouse 7:10we no longer think love the data infrastructure as always a warehouse or 7:14to database 7:15tied to something not tied to just for relational store so 7:19you can have a warehouse but you can scale in Hadoop you can provision data 7:23back and forth you can write queries from either side 7:25arm that's what we're doing is we're enabling clients to modernize 7:30their infrastructure with this type of a lot logical data warehouse approach 7:33when you take those kinda use cases and then you put 7:37the data science tools on top of that suddenly 7:40our customers can develop a deferral a ship with their customers 7:43and the crew was there to change the way that that they're doing business I bet I 7:47wanna get your comments we have the crowd check out chat at Nestle 7:50to do next and commentary coming in as the transfer industries billion tweets 7:54killer for customer experience so customer experience 7:57and then also the link about the data science 8:00into high gear so let's bring that now is that the data science logical 8:05you know store looking make sense with virtualization things moving around 8:08absence of a cognitive 8:09engines out the overlay on top of that customer experience in data science our 8:13the interplay 8:14because this came out in some the retail then New York City they have it was a 8:18great point-of-purchase personalization customer experience he decides 8:22it's all rolling together was at me unpack that force in 8:26simplified became how one delegate complex and get a big topic 8:30you know it's a big topic so arm a couple different point so first of all 8:34I think it is a bad out enabling the data scientist to be able to do what 8:38they 8:39a their specialty years and the technologies have advanced to allow them 8:43to do that 8:44and then it's about them having the 8:47the date on the different forms the data and the analytics at their fingertips to 8:51be applied 8:51I think the the other pointed in it though is that the lines are blurring 8:56between 8:57the person that is the data scientist and 9:01the business user that needs to worry about how do they attract 9:07new customers or how do they you know create new business models and what do 9:12they use as a part of 9:13do think we're also seeing that one blurring 9:16one other things that we're trying to do is 9:19is help the industry around growing skills 9:22so we actually have the day University we have what 230 9:27thousand participants arm and his own line for E 9:31education and we're expanding that topic now to 9:35again go up the stack to go into the things that data scientists want to deal 9:38with light machine learning 9:40to go into things that the business user Willie wants to now be able to 9:45arm capture supported she got kinda more 9:48you can view product question and/or come under question had ID instead 9:52and I DM event indeed she said talked about big medical example 9:56and whenever a pay-per-use cases but she made a comment in there active data 10:00activity is not a new term for the data geeks out there but we look at data 10:04science lag is really important Realty 10:06near real time is not gonna make it for airplanes people cross the street mobile 10:10devices 10:10real real time means like best secondly she's really important speed so 10:15active days be part that's akin you guys talk about passive active data and how 10:19that relates to 10:20computing and 10:22because it's all kinda company is not an obvious thing but she highlighted that 10:25in their presentation because as you can with medical 10:28medical care lassie virgin on in the moment 10:31think so you was at army Muniz that some country paying attention to 10:36is it viable as a doable so certainly a viable 10:39I mean its huge opportunity in and 10:42I supply the most famous story we have around that is the work that we did at 10:46the University Toronto 10:47at the Hospital for Sick Children where we were using real-time stream mean 10:51algorithms and a real time streaming engine to monitor 10:55on influence in the neonatal care facility 10:59in this was a million data points coming off the 11:02human body monitor in real time in 11:05so why is that relevant to mean it's pretty pretty basic actually if you 11:09extract the data you eat a list somewhere you load in a warehouse then 11:13you start to say what's going on 11:15way too late you know we're talking about you know at the moment you need to 11:19know what's happening in so 11:20it started as a lot was in the medical field 11:23with you notice an example that you mention bird real time 11:27is not going well beyond the medical field you know places from 11:30retail at the point of sale on how things are happening 11:33to even things like forming on surreal time is here to stay 11:37we don't really view that is it different from what I would describe 11:40those who do blacks because stream into me as part of what we're doing with to 11:44do 11:44and let's work which will talk about in a bit so 11:47it certainly it is it's a new paradigm for many clients but its 11:51going to be much more common actually if I can and there's a 11:55client North Rana State University it's where I went to school so it's a 11:59if it it's a client that talk about a lot but they 12:03in addition to what they do with their students they also work with a lot of 12:06businesses 12:07own arm different opportunities me that they may have and they have a big data 12:11and analytics 12:12soared above arm extended education 12:15business education on project as a partner that they are now 12:20I am prepared to be able to analyze 12:24one had a bye in near real time 12:27so the examples that you and Rob talked about 12:30love the real-world workloads that are going to exist where 12:34real time matters arm are there there's no doubt about it they're not going away 12:39and the technology is prepared to be able to handle the massive amount 12:43data and analytics that needs to happen right there in real time 12:46that's a great thing great point mean the flagship examples are 12:50kinda like lighthouses for people to look at it go ship the guy come into 12:53that Harbor if you will for other customers 12:55as yours and the early adopters you guys talk about where the mainstream market 13:00is right now see from the service's stamp when you get a great presence 13:03a lot of accounts we're the ships coming into which 13:06Arbor where the lighthouses I see medically missiles example to bring in 13:10the main customers is it the new absolute driving it what innovations 13:14and what are the forces and what are the customers doing in the mainstream right 13:18now where are they 13:19in the evolution moving to these kinda higher and examples 13:22so a mean so Hadoop I'd say this is the year to do 13:27where ClientID I'm serious about to do like I said it's now becoming a 13:30board-level topic 13:32on so its it's at the forefront right now 13:35I'll I see clients been very aggressive about trying out new use cases 13:40everybody really cross every interest industry is looking for one thing 13:44which is growth in the way that you get growth if your bank 13:47is you know we're gonna change your asset structure what you gonna change is 13:50how you engage clients and how you personalize offers 13:53if your retailer you're not gonna grow by simply adding more stores 13:57on might be a short term growth impact you're gonna change 14:00how you engaging with clients so these use cases are very real 14:05and they're happening though Hadoop is a boardroom discussion or big 14:08II the formula we have work to do were is that you know I 14:12i've seen your over and over again Lt where you see a lot from his 14:16companies that are private equity owned from the private equity guys have 14:20figured out that there's 14:21savings and there's innovation here every company I worked with that has 14:25private equity ownership 14:26Hadoop is a boardroom discussion and the ideas how we modernize the 14:30infrastructure 14:31because its it's because other forces though it's because I've mobile 14:35it's because I've cloud that comes to the forefront so absolutely 14:39so it was a good do so I do was great batch is great value movies is going on 14:42there 14:43boardroom in the private because when they're cutting edge probably like your 14:46investment one cellular 14:47pretty quickly yell speed is critical right I would infer that was coming from 14:51the private equity side 14:52speed is critical right so speed to value what does that mean 14:56for IBM your customers how do you guys to live with the speed to values that's 15:00more than things 15:01comes out all the premises of all the conversations as taking it is faster now 15:05so value in the business I will you guys sure so 15:09lot of different ways to approach that so we believe that 15:13as I said what I said before it's not just about the infrastructures about the 15:16inside 15:17we've got a lot of analytic capabilities into what we're doing around 15:21do and spark so that clients can get answers faster 15:25so one thing that we're going to be we have a session here Strada this week 15:29talking about our new innovations in big are 15:31which is are our algorithms which are the only our algorithms that you can run 15:36natively on Hadoop 15:37where your statistical programmers can suddenly start to 15:41from you know analyze data and you know drive that to decision-making as an 15:45example 15:46so we believe that by providing the analytic some top the infrastructure 15:50you can get you can change how clients are getting I R so we'll quickly 15:54arm we've got IBM Software so we've got our Hadoop 15:58infrastructure up on the cloud so anybody can go provision something to 16:01get started and 16:02hours which is not something that was the case even a couple years ago 16:06and so speed is important but the tools and how you get the Insight is 16:10equally important abouts Peter 16:13to value customer deployment standpoint is that the app's was it innovating on 16:17existing 16:18which would you sing well I think it's both 16:21actually arm and and so you talked earlier about system and engagement 16:25versus system of record 16:27you know and I think it into the day 16:29the arm for client is really about systems admin site which is some 16:32combination of that right we tend to think the systems and engagement or the 16:36neuwirth things in your applications and we tend to think the system's record the 16:41older ones but I think it's a combination 16:42and we see it show up in different ways so arm I'll take an example telco 16:47an we have a solution on the now factory 16:51and this is now about applying analytics 16:54in real time on on about 16:57the a network in the dynamics so that for example the operator 17:02has a better view love what's happening for their customers their 17:06in users and they can tell that an application has gone down 17:11and that customers have now switched all the Sony using 17:14a competitive application on a mobile devices you know that's 17:18different and that is that new applications 17:22or old or is it the combination and I think it the day it really comes to a 17:25combination 17:26love the systems that inside I'm just gonna write that down here inside the 17:30said the crowd chants so I gotta talk about the 17:33the holy grail for big data analytics and big data 17:37from your perspective ideas perspective and two where you guys are partner 17:41house the year the shower rich targets 17:44evacuee buyers acquisitions partnerships I mean it's really a program service ok 17:50and valley and 17:50and in the growth big day cloud mobile social 17:54things come together what is the holy grail from your perspective us the 17:58system's 17:59inside teases that out cognitive is a mess as we've heard 18:02so what is the holy grail and then what he is looking for in partnerships and 18:07within the community startups and/or other alliances 18:11sir to restore the was growing he also 18:15other so you know I think it into the day 18:18I'm it is about using technology for Business Value 18:23and business outcome I you know i i really think that's what 18:27the spirit and so if I tell you why we have for example 18:31increased are in attention 18:34dowsman around this topic is because I that it's because what Rob said earlier 18:38when he said the state that clients are now in 18:40arm so that's what i think is really important there 18:44and I think its only going to be successful if it's done 18:48based own standards and something that 18:52is in support love you know heterogeneous environments 18:57i mean that's the world technology that we live in and that's a critical element 19:01to that which leads to why we're apart 19:04the Open Data platform initiative so 19:08the the peace analytics 19:12I'm discuss our common are for example I was just mentioned the crowds at 19:16I make such as part revolution analytics which is not are which is different to 19:20me 19:20is /url angry have going on between the big guys you know I gives a big company 19:25with you guys see in that kinda area arms acquisition targets 19:29yeah I think the numbers would say there's not a land grab I don't think 19:33the only numbers changed 19:34at a macro level at all the last couple years I've been 19:38were very opportunistic in our strategy right we look for things that augment 19:42what we do 19:43I think you know Julie Department 19:46your car your question on partnering bird we do acquisition is not only about 19:50what that company does 19:51but it's about how does it fit within what IBM already does because we're 19:54trying to 19:55we know world going after rising tide in terms of how we deliver our clients need 19:59I think some companies make the mistake they think that if they have a great 20:02product that's relevant to us 20:04maybe maybe not but it's about how sits in what we're doing in 20:08that's how we look at all of our partnerships really and you know we 20:10partner with 20:11global systems integrators even know we have one with an IBM 20:15we partner with I is to use application developers 20:18the big push this week as I describe a fourth-round a to scientists they were 20:21rolling out 20:22data science education on Big Daddy University because 20:26we think that data scientists will quickly find that the best place to do 20:30that 20:31is on an IBM platform because it's the best tools 20:34and if they can provide better insight to their companies are to their clients 20:37they're gonna be better off so I was so yesterday as the commenting on 20:41answer me ended last week an earlier this week about that Twitter 20:44is like on Twitter's figure out people are confused but whatever space book in 20:48and no idea relation which was just swipe made 20:51and I was icing hate what is a great value so I was on the side 20:55yeah it was a winner Twitter love the company misunderstood certainly I think 20:59in this market with his waves coming in more and more 21:03what a misunderstanding I think when I get your perspective you can share with 21:06the folks out there 21:07what is that next week because it's confusing out there you guys are 21:11insiders IBM I would say like Twitter 21:13is is winning doing very well certainly never were close to you guys we're 21:16work were deeply reporting on IBM's we can see the momentum 21:20and positioning it's all in why what we see is that 21:23is is where the outcomes will end up being for customers but this still out 21:27on a stair it's out this early cuz i do share 21:29as as as to where example so what is the big 21:32misunderstanding that you think is out there 21:36around the market were in and what's the next wave 21:40means always ways coming in you not out for the next wave 21:43usually dressed with your expression goes so what is that big 21:47misunderstanding in this kinda Convergys com 21:49hyper targeted with alex is all new stuff huge opportunities 21:54huge ships an inflection point Bapuji on a set on the Q's 21:57both going on same time ship an inflection point so was misunderstood in 22:02with that 22:02next waves so let me start with the next day where the hell back into 22:06and the misunderstanding so next big wave to me is machine learning 22:10and how do you start to take the data assets that you have 22:15and through machine learning in the application those type algorithms 22:19you start to generate better insights or outcomes in 22:23the reason I D is the next big wave is its it may be one the last competitive 22:27modes out there 22:28you think about it if you have a a corpus a datum 22:31that's unique to you and you can practice machine learning on that and 22:35have that 22:36you know either data that you can sell or defeat in your core business 22:40that's something that nobody else to replicate so becomes incredibly powerful 22:44on so one example share with you and 22:47I want to bring in my book but its action are getting published on that 22:50sent maybe next week but so while is publishing a book I wrote 22:53and willing and able to give us a company by the name of coast are 22:57which I think very few people have heard of um co stars in the commercial real 23:01estate business 23:02they weren't even around a decade ago they have skyrocketed you know 23:06from zero to five hundred million dollars in revenue and it's because they 23:10had data on four million commercial properties out there 23:13who else has that actually nobody has that kinda reach 23:17and so they got a unique data assets they can apply things like machine 23:21learning in statistics to that 23:22and there for anybody who wants to do in in commercial real estate 23:26has to start with them so i point is you're sure to get the point we have 23:29some 23:30businesses where data is the product 23:33it's not an enabler it's the actual product and that's probably what are the 23:37big misunderstandings out there 23:38is that you know data system that serves our existing products or existing 23:42services 23:43we're moving to rover date is the product and that's the moat are posted 23:47a court date is the new development kit and what you're basically saying is that 23:50the competitive advantage 23:51a business user can make innovation observation about data 23:55and not be a scientist and change the game that's what you were saying earlier 23:58similarly because the next big wave misunderstanding what he waits 24:02publisher taken what are people not getting what is wall street 24:05what is potential the ceiling on the front and some the innovation but what 24:09is the general public not getting 24:11we are in shift inflection point with the big 24:14shift and misunderstanding so so I 24:17I went into you know actually agree with with wrought that I think folks 24:22aren't yet arm really appreciating and I guess I would twist in a little bit and 24:27say 24:28the Insight arm instead adjust the data but 24:31but they're not realizing what that is 24:34and what it's going to give us the opportunity for you know 24:38arm I would retire early if I actually could predict 24:42everything that was going to happen but but you know I 24:45could you help but if you think about it you know if you think about 24:49you know mid to late 90's and what we would have all fall 24:55that the internet was down all hours to do 24:59compared to what it actually allowed us to do 25:03is probably like night and day and I think the the time we're in now when you 25:07take 25:08data and you take arm 25:11mobility and you take wild and you take these systems are being gauge mind 25:17and the fact the way people individuals actually want to do things 25:21is is similar but almost like on steroids to what we were doing within 25:25the 25:26the mid nineties or so and so you know 25:29the possibilities or are frankly in lights and 25:32and I think that's part of what people aren't necessarily 25:35realizing is that they have to think about that insight that data that 25:40actually has some value to it in very different way 25:43lot of disruptive enables out and 25:47this lot look Apple fight in which ones will be the biggest 25:50right is hard to me you get paid a lot of money to do that as if you figure it 25:54out 25:54keep a secret yeah majid in Yuma sheila is now out there you this 25:57share benefit VANETs everyone knows another and continue had on the inside 26:03but 26:03but not everybody's use it right I mean I think another example a company like 26:06intuit has done a great job 26:08the start of this offer company they become the data company I think what you 26:12what I've observed in all these companies as you can build a business 26:15model that's 26:16effectively recession-proof because data becomes the IP 26:20in the organization in so I don't I actually 26:24you know I think for us doesn't deliver the world we think this is well 26:26understood 26:27I don't think it's not well understood yet insider my right 26:31and you know when we first started doing on big data 26:34research and working with thousands of clients around the world they were 26:38there were six basic use cases it started of course with the customer the 26:42in customer and the customer 360 and that sort of thing went through a number 26:45of different things around 26:47are optimization ex ever but the additional line is about those new 26:52business models 26:53an you know that clearly in the last 12-18 months has become a lot more what 26:58the topic is 26:59when I'm talking to clients on and I think we will 27:03see that expand even more as we go in the future with a lot of activity in the 27:07crowd check on chat 27:08slash you necks and I'll mention we get priced in time 27:12you guys wanna keep it keep going conversations are so when you getting 27:15that ok here 27:16so we'll move the conversation to 27:19cratchit an SSD next great thought leadership in acting on the stuff for an 27:23hour you guys are awesome great agony cue 27:25so much to talk about what a great we'll certainly see it in a conectar al final 27:29question for you guys is 27:30where you guys see for this week real quick summarize what do you expect to 27:33see a on fall 27:35for big data week here in Silicon Valley the DSP 27:38so I think yes 27:41love the we talked about machine learning is going to be a big topic I 27:45think to be a lot of discussion around the Open Data Platform that mission 27:48before it's a big move 27:49that we made along with another group supporting the Apache Software 27:52Foundation I think that that's a big thing for this week 27:55arm but should be exciting i right gas station on you gotta be in here inside 28:00the cube we live in Silicon Valley with the right back with an ex guess if the 28:03strip break 28:03i'm jennifer is the QB rape