Videos

Solving the IBM Big Data for Social Good Challenge

Post Comment

Overview

More than 1,000 developers have signed up for the IBM Big Data for Social Good Challenge. Have you? See how others are bringing together civic data sets with analytics for Hadoop and solving big problems with big data.

Transcript

0:00
hello Mike nobles hear from IBM gonna take you through a quick example demo
0:04
IBM's been assed for an example have how do I go from
0:09
into and for this big data for social good challenge
0:12
so what up straight in we want to do in this quick video is an overview of the
0:17
demonstration
0:17
that I'm about to provide and we'll go through some other details
0:21
the demonstration as well in terms of data used in some visualization options
0:25
and they just know that the plan was initially to have this
0:28
demo be fully contained and the web alone
0:31
so all you need is a web browser to be able to complete the challenge
0:35
however we've had some questions about additional environments and how do I get
0:39
things to run from Windows and Linux
0:41
and so we've included those here at the end of summer the details
0:44
let's go through the overview we're going to provide as part of this demo
0:47
the demo assumes that you're going to come to move from left to right
0:51
there are some data that's available on the left hand side
0:54
then we want to be able to create these new enhanced applications
0:57
on the right hand side and so the bottom there you see that day would be good to
1:01
have
1:02
big data and analytics infrastructure to support this so the first thing you
1:05
gonna do
1:06
is go through in create your analytics for Hadoop service
1:09
by creating an IBM blue mix account and they're gonna take
1:13
you through three different samples in this demo have had a move data
1:16
into that service first one is going to be pushing from your local machine
1:20
and now you will use the IBM begin sites web console
1:24
specifically in the Files tab to be able to do that and then
1:28
you'll also be able to use it he power pole from a remote
1:31
location in there is an application that we included in the Applications tab
1:36
to be able to access any rest API in order to pull data from a remote
1:40
location
1:41
into your service directly so does not have to go through a local
1:45
pull down or download and then a push from a local
1:48
but you still have that option is the first one and then there's also a push
1:52
our own remote which is %uh allowing you to go to maybe any server
1:56
in your environment that supports curl and then be able to use that
2:00
environment to say have got a local file here that I might want to
2:03
push using the big insights REST API so you got Twitter data that you've already
2:07
collected
2:08
maybe you have relational database that you also want to include some the data
2:12
in your daily
2:13
expert group service how you could use this curl mechanism for that as well
2:18
and on the right hand side once we have this data ingested
2:21
what are my different options for processing that data in this demo we
2:24
just can highlight three
2:26
but there are many other ways that you can actually process this data
2:30
in your inlex for Hadoop service environment on the cloud
2:33
first way is there again just using a web browser we have a spreadsheet
2:37
paradigm we call big sheets
2:39
that's built into the service and so just by using
2:42
a web browser you can utilize big sheets in order to access the files that you
2:46
just congested
2:47
on the left hand side there's also a way to move from big sheets
2:51
to big SQL so suppose you were familiar with SQL
2:54
wanted to run some analytics and some queries against your data
2:58
and you can create tables from big sheets
3:01
you can also create tables directly on the files as well
3:04
and then look at that SQL result in Eclipse or even tableau for example
3:09
and then for the advanced analytics option we wanted to make sure that you
3:13
were aware that there's
3:15
machine learning capabilities as well as the ability to run our
3:18
directly on the service on the data that you've collected from left hand side
3:22
we call that big are within our service
3:26
and you can utilize a built-in application which is the first
3:30
icon there with the ER listed on the microscope
3:33
and then there's also a command prompt option for being able to run
3:36
from are as well and so I included Lipsy is proposed to these columns because
3:42
there's more features and functions within the service for example text
3:46
analytics that we will not cover
3:47
in this demo and then there's also an option to use Cognos software because it
3:52
generates SQL
3:53
and can communicate with big SQL serve your cock knows
3:56
shop already feel free to use Cognos maybe instead of tableau word in
4:01
addition to tableau
4:02
based on what your solution may require so this is the overall demo flow of what
4:07
we will show
4:08
and hopefully in the near future we can add Cognos as an example
4:12
so some others ulysses cover different boxes that you see on the slide here
4:16
I just wanted you to be aware of everything that's available to you you
4:19
can utilize anything in the
4:20
orange boxes which is the open source components that we
4:24
include as well as the dark blue boxes which is our Quick Start addition
4:27
so all these boxes are available to you to be able to utilize
4:31
as part of the challenge in new solution light blue boxes
4:35
are the enterprise our production environment in the green boxes
4:38
for additional features that we provide through other products that will not be
4:42
part
4:42
old something we will provide as IBM for this challenge
4:45
so big sheets I mention this is just a quick screenshot of this to show you
4:50
that you can take and load up your data and then use a reader
4:53
here in the bottom center and to be able to then look at the data
4:57
and then visualize the data with different types of church that you see
5:00
on the right hand side
5:01
so this is big sheets in this is a spreadsheet interface that's all web
5:05
based
5:06
there's no local client install required in order to use big sheets for looking
5:10
at the data
5:11
i doing things like joins in group buys
5:14
and even a hundred and forty plus functions that we provide
5:18
as part love the big sheet interface and we'll show you put this in the demo
5:21
and then a like I mentioned before taking his big sheets to
5:25
click on the CREATE TABLE button in order to create a table from that big
5:29
sheet environment
5:30
and that data loads that data to be used by big sequel and hive
5:34
because we share the age catalog environment between big sequel in high
5:39
and so we talk about big sequel it's basically allowing any application that
5:43
generates SQL
5:45
to come in through a JDBC Road EBC driver in the
5:48
be running in the big insights a farm in which in your case is the analytics for
5:52
Hadoop service
5:53
so this provides the ability for Nancy sequel 2011 complying queries
5:57
to be run against your data and the other advantage
6:00
is the fact that we have these big sequel worker services that run on each
6:05
node in the cluster which help you to just use in HDFS reader
6:09
to be able to read the data without using MapReduce
6:12
so this should be much more performance solution for SQL
6:16
then then hired for example and then there's an area that we make full use
6:21
other we anticipated you would also make you so
6:23
which is our applications area and in the web console you'll see on the left
6:27
hand side some icons that have been
6:29
applications that have been deployed to the cluster misses you click on one of
6:33
those icons
6:34
you could ask some questions on the right hand side
6:37
with which to be able to fill out and run an application so you have full
6:41
access to be able to run existing when applications
6:44
should you find with applications they're close to something that you need
6:48
to work but you want to be able to access the source code
6:51
and change that application and then republish it you're able to do that with
6:55
the Eclipse environment to download
6:57
now the clips plugins in the source code for each of the applications is also
7:01
provided for you
7:02
in that Eclipse environment shown here how you can do things like SQL
7:07
you can do your Java programming issue if you require that and there again
7:11
hopefully in the near future will be able to show you an example have text
7:14
analytics
7:14
which it also run from the same clips environment if you want to do advanced
7:19
analytics as we call it
7:20
now using our there are some our packages and they're actually
7:24
already deployed on your analytics for Hadoop cluster there's also some
7:29
machine learning algorithms that are provided there as well
7:32
and so we anticipate that you'll use one of the built-in applications
7:36
in the Applications tab that we just looked at as a way to be able to tour on
7:40
are on the cluster and you're just a quick example have text analytics
7:44
that we hope to add to this demo shortly but just a few
7:48
have this idea that i'm gonna upload some some unstructured data or just some
7:51
text
7:52
and you want to be able to go in and pull out different components are the
7:55
text
7:55
just know that in big sheets there's built-in functions the use the text
7:59
analytics engine
8:00
free to be able to do and licks on that text and and high-level
8:04
let's get into exactly we're gonna show as part of this demo
8:07
so for the demo we had this concept of saying hey we want to be able to show
8:11
things come from an Indian standpoint and show the logistics behind what it
8:16
would take
8:16
to do something simple which in our mind would be pulling some data
8:20
into the cluster joining that data together how to look for a correlation
8:24
and then potentially using big are actually run some advanced
8:27
algorithms to say I'm you can I see a visual correlation
8:31
I can you scientifically tell me by using in our algorithm
8:35
number twenty regression in RT's to datasets
8:39
related to each other in a linear fashion so that I can and make some
8:42
other conclusions
8:43
and so in this case now there's a concept that's that's listed here
8:46
the data flows what I just mention we're gonna download some day they were going
8:50
to use some haps to pull some data
8:51
and they were actually can hurt can run some big sheets charts
8:55
he and as well as in our script application to run them
8:58
and then towards the into this video I'm if you're interested
9:01
we also include a way to run tableau
9:04
and Eclipse from your local environment but that's not a requirement which is
9:09
why it was that is optional
9:10
rents was getting straight away to the live demonstration
9:13
so as a companion to this video we also have a document that will go through
9:19
give you step-by-step detailed instructions on how to replicate
9:23
this demo because I'm only going to do this demonstration at a very high level
9:27
will be some steps that I skip and hopefully that won't be frustrating
9:30
because if you go and look at the stat this document how you actually see that
9:34
there's an additional details
9:36
and then out that I leave out at this video but there will allow you to see
9:40
the details what it takes to actually produce
9:42
everything that i'm gonna show you just in the interest of time
9:45
I will go through and do a single example but there may be other examples
9:49
listed in here
9:50
so how do I setup my blue mix environment Haddaway
9:53
have some different data download and upload options how do I work with big
9:57
sheets
9:57
in terms of workbooks and charting Pam how do I go through and create a big
10:02
SQL table from that from the workbook and how do I deploy
10:06
in then run my ad hoc our script application
10:09
and then the optional environments I also talk about what you need to
10:12
download it how you need to install things
10:15
in order to get things working in your Windows environment am all were further
10:19
down we also talk about
10:20
am non Windows environments which would include your Linux in your Mac for
10:24
example
10:25
to will refer back to this sir no document as part of the demo
10:29
and if you get a chance to go through each one of these steps you don't have
10:32
to go through all 70 pages
10:34
you can just pick and choose the sections that you want to go through
10:37
am in terms of may be learning an example of how to do some %uh these
10:40
activities
10:42
so step 1: would be going to a browser the next to go into bloom next on net
10:47
and so in this regard
10:48
where you will have is redirection to this console dot in G
10:52
Dublin next %uh net that's expected and now what I want to do is go ahead and
10:56
login
10:57
so I'm assuming you already setup an account on Blu mix I'm assuming you've
11:00
already signed up for the
11:02
for the challenge in have have gone through those initial steps
11:06
and so here but i wanna be able to do his log into clinics
11:10
to sign in with my IBM ID what's the environment comes up it reminds me how
11:15
many days I have left
11:16
home I trial if I scroll down actually see the service that I've already
11:21
launched
11:21
you will click Add service and bring in this new service if you haven't done so
11:26
already
11:26
and once I click on app little icon
11:29
to bring me into the environment here I actually see the ability to launch
11:34
my big insights environment I scroll down here
11:38
you actually seen the bottom right well your username is
11:42
in the right underneath this is my password then you also want to make sure
11:45
that you've gone through in taking care of the social good challenge steps
11:49
in my world have already registered and successfully activated that service
11:52
which gives me additional time on the clock and it also gives me
11:57
the ability for the machine learning algorithms to be installed
12:00
and into my retirement to you want to make sure that you do that first as well
12:04
now let's go ahead and launch our environment so just like I mentioned on
12:08
the last page
12:09
there was the password you would need that for your client connections if you
12:12
choose to do that
12:13
you also need to be familiar with this number
12:17
here as well and so this is your instance number and that'll change be
12:20
unique for each person
12:22
in the password will also be unique for each person as well
12:25
so we show the quick slide about the blue boxes in the orange boxes
12:29
those can be shown here in a cluster status environment and this just shows
12:32
you
12:33
kinda what's going on within your environment in your in your cluster
12:37
he drove the services are listed here should something I need to be
12:41
restarted for example how you can come in here and click on a service
12:44
and then
12:45
be able to stop and start at services necessary so to show you where we're
12:50
going
12:50
now let's go to the dashboard environment here and what I've done as
12:53
I've created a quick
12:55
example at the end result of what we're doing so here I have my Big Data
12:58
challenge
12:59
where might I want to end up well how we're gonna pull down some
13:03
a new york city 311 data and maybe I wanna do a quick chart
13:07
have that the top 10 agencies that have had 3 11
13:11
and in events recorded for them but I might want to look at those 311 events
13:16
overtime so this is just to show you the visualization capability
13:19
and the ability to take from two different datasets in this case
13:22
to put them side by side and the dashboard and so how did we do this
13:26
we went into files in this is the the first way in terms of pushing from a
13:30
local environment
13:31
is that we have this ability to go through and say let me go create a
13:35
folder
13:36
for holding the data that I'm going to use for
13:39
demo so I can get my be directly whatever name that I want click click on
13:44
OK
13:44
and then underneath their I'm gonna go and get some data
13:47
and the first 1 I'm gonna get is my weather data
13:50
thing click on pulled her to create for whether common than I can also come in
13:55
here
13:55
and create another folder if I wanted to for a new york city 311 for example
14:00
has ok re whenever directory structure that you want I'm I have an example one
14:04
that's already been pre filled out Grove Rd you pre
14:06
loaded some of these different files and so here's here's an example information
14:10
that I have their
14:12
and so now had a doubt whether file get there I have the weather
14:16
Dana I'm from what I've collected on the web and so let's look at a quick example
14:20
where I pulled it from so here on the Big Data for social
14:24
good challenge page there is a link here for data
14:27
in if I go to that tab and then scroll down
14:31
I don't find anything for a weather but here is my
14:34
New York City 311 data service so let's go ahead and click their
14:39
but but you're not an excluded from using any other free data that's
14:43
available
14:43
so into a Google search and I Weeden in did a search for free weather data
14:48
and so weather is part of your solution you can go to this quick links area here
14:53
for the climate data center I clicked on
14:56
this historical climatology Dana and then I clicked on the FTP access link
15:01
and so here again for anybody that sharing public information
15:05
how you can go to buy year and I scrolled all the way to the bottom
15:09
and I downloaded the 2014 energy zip in the 2013 geez apps file
15:14
her because I wanted to be able to see the data in an uncompressed fashion
15:18
I put that down locally and then on uncompressed it so that I can upload it
15:22
it's not exactly what I was going to do over and so the window
15:26
which is under my weather now I want to
15:29
do an upload there's a nice way to put data into the cluster through the upload
15:33
field
15:34
I can choose my 2014 that CSV
15:37
which came directly from the weather download an innocent as I click on OK
15:41
it was start to do with the upload into the environment to buy click on OK
15:45
in the bottom left you see dead Google Chrome which is what I'm using here
15:50
will give me the status what's going on with that upload and so
15:53
it'll take a little while I'm just because I love my connection speed
15:57
and the size the file itself and so up to two gigs
16:01
you can use this approach for putting data into the cluster
16:04
in at this point hour mark insalata that by just clicking somewhere
16:08
but you will not want to interrupt this session think a she wanted to let that
16:12
entire file
16:13
complete am if you interrupt this the session click somewhere else like I'm
16:17
going to do
16:18
we may see and hear somewhere later in the demo
16:22
from it saying hey you clicked interchanged interrupted my upload
16:27
so just keep that in mind you wanna let that complete home we can do a right
16:31
click
16:31
refresh and that'll show me the contents and in this case
16:35
interrupted so it's confused about how many bytes from the file because it has
16:38
been closed yet and so forth
16:40
so just keep that in mind you wanna let that upload finish
16:43
all the way to completion so this case have already done this
16:46
now for 2013 and 2014 in so through that same process over
16:51
upload I was able to put a 945 megabyte file
16:54
which is all the weather stations they were reported and shared for 2014
16:58
says the first Saturday to so at this point but I might wanna do is look at
17:03
this data in a spreadsheet format and if I wanna
17:06
ever create a table from that
17:08
I really need to make my big sheets collection be based on a folder
17:12
so this case I'm gonna go through an create a spreadsheet from the folder
17:16
and could have multiple files and as long as they're the same format big
17:19
sheets will not get confused
17:21
and so now what I can do is go through and say what type of file is this
17:25
in this case it's a CSV how the weather data
17:28
people did not include headers for hetero so I will turn out of
17:32
inslaw the reader now I have columns
17:35
and so now what I can do as I can save this as a
17:38
demo whether 2014 right so there's
17:42
demo whether 2014 as as a click on Save I should now be able to go through and
17:48
to look at this information if I wanted to but the first things I wanna do
17:52
is actually do a filter just from looking at new york city
17:55
precipitation and so this will be one of the first things that you pry wanna do
17:59
with any other data that you pull down as well
18:01
is the Sabres a large dataset and I may be only need to look at parted
18:05
so this case and going to build a new workbook and again remind myself
18:10
on a pretty immediately have what I'm working on which is 2014
18:14
York City so now I have this home
18:17
ability to go through it had submitted a deaf I want to this is stationed name
18:22
and I can follow this across the way and don't worry if you can follow all that
18:27
I just read email for these columns because all the instructions are there
18:31
we also have instructions on how to I remove some of these other columns that
18:35
we know we're not commuter lines
18:37
so modest skinny down my dataset that way as well
18:40
and no need to do is do filter im gonna do a filter on
18:44
and New York City precipitation only Summit station name
18:48
is and this is where the document will come into play that I can do a find for
18:53
station name and then go find out the one that I've already researched
18:57
is right here for the Korean and
19:01
I also want the weather type to be
19:05
Piercy p
19:07
so this list a pull-down and you'll see the same thing here
19:10
is limited to updated its in the first two thousand rows over your environment
19:15
so the niceties as I can select if I see it
19:18
or I can just type it in like we did here if I don't see it in the pull-down
19:21
list
19:21
time has as one that matches okay so just keep that in mind so now I can do
19:26
this filtering and because the first two thousand rose don't have anything that
19:31
matches this is expected
19:32
so I can save and exit and then I can go ahead and run
19:36
this dataset and so the nicety here is that down miss this in a synchronous
19:41
operation
19:42
the job is actually now running behind the scenes am here is my
19:46
a progress bar but in this case be i blue mix
19:49
is an administrator so I can go to the applications data stab
19:52
and I can actually go look and see what's going on with the job that I just
19:56
a minute
19:56
and here is my demo whether 2014 New York City
20:00
if I click anywhere on this row the bottom also expands
20:04
to show me how many members are actually running and how many reducers are
20:07
expected to run once those complete
20:09
and so forth so as long as I click on this link this row
20:12
little update the status up the elements in the bottom that in these items
20:17
items automatically update when I do a refresh here honorably every 15 seconds
20:21
as the default
20:22
saw inside job completes will be able to do some additional work with that
20:26
like any good demo we have some things that are Rd pre done
20:30
here I just wanna show you how what the pictured collections are the haps I've
20:34
done some filtering by weather
20:36
I've downloaded the new york city data so how did we do that
20:40
and the new york city data came from here so this is where I clicked on the
20:44
link
20:44
in the social a social good challenge page here
20:48
in that took me straight to where I can do an export and I can right click
20:52
on CSP I can copy that link location
20:55
and by copying that leak location I can now utilize that
20:59
in an application on a big insights
21:02
and so here when I first came in this would be empty
21:05
I can manage and I can go ahead and select
21:09
and an application then I can deploy it once I deployed
21:12
that'd make said able to be run so there again the directions for all this are in
21:16
the document
21:17
and I can come back to my nap click on on the left
21:20
and then right click Paste for them and the where do I want my output to go
21:25
I can browse input that home information
21:29
where I wanted to go to his arrest call make you dislike that folder even I
21:34
already have a file their
21:35
and so now when I click on run it'll actually poll that data directly from
21:39
that rest API
21:40
and landed in my local dataset to click on a previous run
21:44
here it shows me those details how this took about 2700 seconds
21:49
had to pull that down now for the four gig dataset and then I can click on his
21:53
output button
21:54
which will take me directly into files where the
21:57
result actually came and so this is 311 data
22:00
from 2010 until today that iOS 4.1 gigabytes in size
22:05
such a nice way to get a larger Amanda data to be pulled straight in your
22:08
cluster
22:09
and that did not require me to pull it down locally first and then we do have
22:14
some instructions for if you do have a large dataset
22:16
anyone using REST API to be able to to populate and information
22:20
I used a curl command from my local machine to be able to actually put the
22:24
same 4.1 gigabyte file again into a different folder
22:27
and so we've done some things where we have the 311 data
22:31
just from 2014 my gone through a done the sheet idea here from the same
22:36
idea worth a folder using CSP again
22:39
is so just to kinda fast forward to doing some processing
22:43
we've done some group buys we've done some filtering we've
22:47
combine some things together and so we've done a joint even
22:50
in been big sheets where you want to wind up with this idea saying I have
22:55
311 events and what were the numbers events
22:58
what was the amount arraign in this allows me to then go through and say
23:02
looking at a chart
23:03
he and I might wanna had a line chart for example
23:07
and I can fill out some other details
23:10
so I wanna look at both total offense in weather amount together
23:14
and a click on home here when they have the first two hundred days of the year
23:19
shown my chart
23:20
Innova click on the green arrow button it'll actually generate
23:23
that line chart which look exactly like the one I've already pre run here
23:27
so in this church initially comes up have already run the correlation between
23:30
the two
23:31
I've done some big sheet activities to do the joins us on the document
23:34
now for step-by-step then I see in this area rate here that there's a large man
23:38
arraigned
23:39
and there's a spike in the activity 4311 events they were recorded on that day
23:45
to there may be a correlation here record report something back to new york
23:48
city in say
23:50
when it's really rainy day here's what you can do to improve the life of your
23:53
citizens
23:54
now I would want to do some support analytics to be able to drill down and
23:58
figure out what are the details is there one particular agency
24:01
how they that is causing a spike in activity in and so forth so this is
24:05
really your challenge
24:06
but I wanted to do was get you in into in a mode here where we could just pick
24:10
couple datasets
24:12
how I pull them down I give you instructions for how to join those two
24:15
together
24:16
then see if there's any additional correlation information that can happen
24:19
so starting with this result said now what I might want to do
24:23
is actually go through and say well done some big sheets activities
24:26
but now I want to see is there a linear correlation between my total offense
24:31
and my whether about and I want to use a routine
24:34
in our in our case it's a machine learning however that we've added to our
24:38
collection
24:39
called linear regression in order to do that gonna
24:42
go back to my applications area am gonna have this
24:45
her script here called in our script and so we provide an example of a linear
24:50
regression
24:51
in in the arse Creek area and you would paste that into the to the script area
24:55
here and click on run
24:57
prior to that every I data scientist knows that there's some additional work
25:01
that needs to be done
25:02
initially so even though I showed you that my
25:06
311 can be in rain was here
25:09
I actually needed data said am that just has those two columns
25:13
and so here you can navigate to any these
25:16
data sets the pool things in have already done a a filtered
25:20
have just the last two columns and that's what this linear 311
25:24
Rainiers and I've exported that data into a CSV
25:28
on my local file system and have given it a path
25:31
now I go look at the apt the exporter that wound up here in this big are
25:35
folder and here's what that looks like just two columns and I want to run my
25:39
linear regression on that let me go back to
25:42
my applications area and using that as input I can remind our script
25:46
and I can easily click on one of these rose to repopulate my script up into the
25:51
upper window
25:51
India and run out again so let me do that
25:55
and then I'll be right back because here's my job but actually running
25:59
and the nicety here is that term I can track its progress here
26:03
I can click on the Details button to go watch the job run
26:06
another nicely that I want to show you here was that if I minimize this
26:10
and environment here when I can actually see is mine are script window has little
26:15
resizing capability here in the bottom right so should you have a longer
26:19
scripted you wanna see more that on-screen
26:21
feel free to resize that area for your script to run
26:24
per the reason I don't wanna show this actual running in the script
26:27
as it does you do have to put your password in this script
26:31
in order to connect to your data source locally Hanso
26:34
just protecting that information to give you an idea of what the script looks
26:38
like it's not a very large script at all
26:41
I mean that just finished political look an example
26:44
love what that as going to run so if I go to
26:47
step 144 stem that by memory and here's the actual script
26:52
so we pull in our big our library we do a connection there again with your
26:57
number in your password see it replace those in your script
27:00
which do a quick check to make sure that the big are is connected
27:04
and then we home run and linear regression against our
27:08
311 PRC Pierre precipitation file
27:12
and then our output goes into that big our folder again as linear output
27:16
let's go see with the result on that looks like so here you go to my big our
27:20
folder
27:21
and I can right click in the do a refresh in here
27:24
I see my earlier output and so here's the stats
27:28
so for folks that come know what this means
27:31
in your the r2 values is my raining in my 311 events were correlated with each
27:37
her this algorithm which show me something closer to .8
27:40
or higher in this case it's a very very low r2 values
27:44
and so that means that these two variables are not linearly related to
27:48
each other
27:48
and so maybe we need to do some other investigation
27:51
and so that's just showing you how to run some advanced analytics
27:55
he and and the actual mechanism for doing that does not require you to do
27:59
anything through a local client which is nice so as we wrap up the machine
28:03
learning portion
28:04
up this demo I just want to point out step 150
28:07
actually has additional algorithms that are included
28:11
in our machine learning packaging as well as two different links in step 1 51
28:15
that will help you with a big our tutorial so that you can do
28:19
additional analysis on your data I'm at this point that's the book
28:22
%uh the demo I love what I want to show that was web-based so keep in mind
28:26
everything you've seen up until this point
28:27
has not required anything anything to be installed locally
28:31
and so now if you do have an interest in seeing
28:34
and how to do some a different analytics work
28:37
maybe wanna run Eclipse maybe you also want to
28:40
I'm see what tableau would look like in terms of a connection
28:44
you may be running Linux or Mac you also wanna run Eclipse there
28:47
the next part is video will take care of showing you how to do that
28:51
and I've provided those environments through her to virtual machines that are
28:55
running here on my local
28:56
environment the first virtual machine that I have is a Windows 7 64 bit
29:00
environment
29:01
and I've got in clips install again all the downloads
29:04
in the installation configuration instructions are all included in the
29:08
document that we've provided
29:10
pearly whites along tom is all the details of how do i download clips
29:14
how do I install and configure Eclipse how do I make sure my connection to my
29:18
cluster
29:19
is working properly and then you should be able to to utilize this environment
29:23
if they're
29:23
is your desire to hear am as the in
29:26
examples come up I have my my servers listed in my connection
29:30
is actually made I can now go up here him to create a project
29:34
open SQL file and if I wanna run that same join in result against my big SQL
29:40
were have created those same tables
29:42
from those big sheets workbooks here is the SQL that would run out
29:46
and I can just go to the run SQL button it'll connect over to you by name
29:51
blue mix cluster in here I'm running the eclipse from
29:55
my Windows client and was taking 30 10 seconds to connect
29:59
and then here is the actual result so I can double-click
30:03
my sick or results to maximize that see worked and see what comes back from my
30:07
event date so my total offense and whether amount again
30:10
this is all done through SQL double click again
30:13
and then returns back to its normal size and so that's
30:16
eclipse running SQL against a cluster so there's instructions for how to install
30:20
and configure
30:21
all the connections there some clothes that down
30:24
and then I'm running tableau since I am
30:28
I do not have a license copy and Adam running a 14-day trial
30:32
in so I can continue my trial I've gone through and said
30:35
connect to data I'm scroll down here to be in Saints
30:39
I provided all of my information I with which to be able to do that
30:44
and then now I am say that has a workbook
30:47
and so here is my pre saved workbook which then we'll ask me at least for my
30:52
password again
30:53
so let me go and copy that and I'll be right back
30:56
okay so on points ago copy my password and then peace that in
31:00
I can actually go here and connect on this and it will run the big sequel
31:05
query
31:05
that doubles the data back into tableau so that I can now interact
31:10
with that data locally don't be alarmed it takes a few seconds
31:14
because it is am connecting after the cloud
31:17
and a big big sequels running a joint in this case cross those two tables
31:21
so there again all the installation for configuration and setup
31:25
I'm here's my rain events and then
31:28
here's my humphrey 11 events I can go immune click on
31:32
different ways turned to look at this data and maybe this one is little bit
31:36
more intuitive
31:37
and mister the bar charts and so here I believe
31:41
here's yes so there it is April 30th
31:45
weather is the large rain event and there is a
31:48
more 311 events both that day and the next day and so that's
31:52
little bit more telling than what I was looking at Indian be big sheets
31:55
so yes you can do keeping excluded me and do different slices that information
32:00
in here
32:00
hey that's what you choose to do as part of your solution but this was just to
32:04
show you this is possible
32:05
in that damn tableau does support big insights directly
32:09
we can close it down here we can be done now I also have arlen X
32:13
environment running here as of: I want to do the same type of thing
32:17
there again the instructions are included and for how do I run Eclipse
32:20
for running SQL in my Linux environment and look and feel very similar to what
32:25
we just did
32:26
in the Windows environment the difference for the Linux environment as
32:29
if I've
32:30
stored my passcode he will ask me for
32:33
my global pass code for my local celeb information but here again very
32:37
similarly
32:38
I can come into my home simple SQL statement to choose which
32:43
cluster that I wanna connect to my connection information is here
32:47
for big sequel on the cloud in it now
32:50
I can also run the seller information here and the bottom
32:54
other clips you actually see things run as expected and there again
32:58
has a is our results we can double click here as well
33:02
see our connection and see the actual results or the creator came back
33:07
on Eclipse in Linux can't twins closer down
33:11
and that should complete your area
33:14
optional environments so this point let's just head back to the PowerPoint
33:18
to just wrap things up
33:19
so we've completed the live demonstration and so now just get some
33:22
summary information to provide for you as well as some helpful links
33:25
in resources and potentially next steps so here's our demo
33:30
overview we went through and pulled different datasets
33:34
used three different mechanisms with which to be able to do that we did not
33:37
show the curl one in this video but it is in the document that I wrote my
33:41
referenced and we did she and see big sheets
33:44
big sequel in big are all in action within this demo
33:48
running in the browser running Eclipse tableau in Ann Arbor
33:51
environment they're called big are as far as running algorithms
33:55
so this case amish two more slides and we're done here we have are some links
33:59
that will be very useful
34:00
as PowerPoint will be provided as a PDF as well
34:04
and I'm here is a big day University thats IBM's free training
34:08
if you're curious what people are doing with Hadoop there's the power by Hadoop
34:11
link which is also very useful
34:13
and if you want to be able to run big insights locally
34:16
and have that in a disconnected environment you can run
34:19
virtual machine a big insights the Quick Start addition
34:23
I will feel very similar to what you have in your blood makes cloud
34:26
environment
34:27
he accepted to be local so it may be a benefit when you want to do some some
34:31
work and then you're not in it connected environment
34:33
our resources page to have to power some big school to give information
34:37
if you didn't wanna hop street in to do the am social media that's got the text
34:41
analytics
34:42
example that's their you wanna learn more about what's an explorer or even
34:46
what send us can with the second half
34:48
have this page and links are so thank you very much for your time
34:52
hoping this has been useful information hope to talk peace in