Recap: Big Data: What Drives You and Where Do You Start?

Big Data Evangelist, IBM

Big data can be intimidating for the uninitiated. For starters, the emphasis on "big" might mislead you into thinking you must hit the high end of all "Vs" – volume, velocity, and variety – in order to justify using the technology.

The truth of big data is that you can, and probably should, start small. Big data is all about using advanced analytics to find deep patterns in data sets of any size, while giving yourself the headroom to add more data from more sources as your needs grow.

This past Tuesday, July 24, IBM hosted a tweetchat on this topic, under the hashtag  #IBMDataChat. The event discussed business drivers and implementation on-ramps in the strategic push to Big Data. The chat was moderated by yours truly, professional IBM big data evangelist maniac @jameskobielus.

Pacific Northwest National Laboratory (PNNL) and Trident Marketing were our featured participants, helping us to explore big data business drivers, justification strategies, staffing and skill requirements, technical environment, and key performance indicators. We had many other participants from across the Twittersphere. What follows are the most noteworthy discussions.

Big Data in Smart Grid Research

Ron Melton (@rbm55) is Director of the PNNL Smart Grid Demonstration project (@PNW_SmartGrid) in conjunction with Battelle. Dr. Melton, administrator of the GridWise Architecture Council, and a senior technical leader for smart grid research and development projects. He provided a comprehensive picture of their use of big data for smart grid research and a look ahead at their future plans:

  • What was your initial big data project?

Got started with big data in the 90's - global climate change research. Now applying to smart grid r&d

We have the largest #smartgrid demonstration project in the US.

Operational data collection starts in September. Currently collecting about 11 measurements/second

In the 90's we were collecting 7 gigabytes of data / day. Volumes for that effort are at least 10 times bigger today.

#smartgrid has the potential to swamp that. Our project will generate several TBytes over the 2 year data collection period

With a very small number of active points. Scaled up there will be huge data volumes.

Current objective is to generate understanding of cost/benefit for #smartgrid technology

Using streams and Netezza to facilitate data capture and then analysis

  • What business opportunities drove you to adopt big data?

we do research, so it isn't so much business opportunity as research necessity

we must deal with large volumes of engineering data - time series - managing and analyzing the data

future complexity large. May need streams to detect emergent behavior in our distribued system

#smartgrid effort not linked to climate data. we will mostly to after the fact analysis

Must, however, monitor system operations, detect errors, cyber security threats, etc.

  • What type of big data do you deal with and where it is located?

events, structured - I don't think we have the others.

we deal with both static and dynamic data. Latter - time-series data on a five minute interval

Static data reported on varying intervals from daily to quarterly with varying formats

We stage our data using streams and feed it into a Netezza system

We have chosen to move our data into structured format

We will be doing a lot of correlation analysis with the data.

  • What were some of the challenges you experienced in locating your big data?

Locating our data is not a challenge. We are finishing implementation of a distributed system of our design.

The nature of the system is that it will generate #bigdata to support the research

I don't think Federated would apply. Loosely coupled system with interoperability spec between the nodes

#bigdata applies now because of the research. As an operational system #bigdata applies in other ways.

scaled up to millions of nodes the economics aspects will require transaction processing and monitoring

  • How did you make the business justification for your initial big data project?

In research the business justification is driven by the research objectives.

Our ability to sense, calculate and overall generate huge volumes of data is always happening in research :)

Economic transactions. #smartgrid is about modernizing the electric power system. Economics is a key -

Our project implements a technique we call "transactive control and coordination"

Our #smartgrid #bigdata efforts are to support our research on the effectiveness of our transactive control technology.

We work with a signal representing the cost of power delivered to any point in the power system

We also deal with signal describing the expected behavior of distributed energy resources, demand response systems

  • Did you implement big data as enhancement to existing data analytics platform, or through an entirely new infrastructure?

We implemented specifically for this project - new infrastructure

We have prior experience with conventional database technology. Not a good fit to large volumes of engineering data.

We have experience with Netezza for other projects - so we evaluated with data from a previous project. Good results

We have also worked with IBM research on #smartgrid for over 10 years. We invited them and Netezza to join project team.

When you are writing a proposal you have the opportunity to form teams!

The real action in #smartgrid is "applications" that do something with data from devi, ces.

In our case the "app" is distributed throughout the system. In other cases may be drawing from centralize data.

Example of non-distribute #smartgrid apps - see #greenbutton initiative

  • Did you implement big data entirely with in-house personnel, or did you engage professional services partners?

IBM and Netezza partners supply about 75% of the #bigdata expertise. The rest is in house.

We recruit domain experts with advanced degrees and good analytic skills.

We also tap in-house groups of applied statisticians.

With about 4500 staff in a variety of technical disciplines we can usually find the right person

  • Did your personnel need to enhance their technical skillsets and certifications to get started with big data?

For our project - no. We had one training session by Netezza. Now leveraging Netezza where we can.

Relying on IBM Research team for Streams expertise.

Absolutely right on sensors - phasor measurement units for example. Not part of our project but overall very important

Using streams to stage data to the Netezza. Also, a little bit of on the fly analysis, error checking, etc.

Both higher ed and industry. Domain expertise for us is electric power systems and computer system engineering

  • How do you plan to evolve your initial investment in big data?

No plans to evolve. 5 year project.

But, will plan to even bigger projects in the future. Will have new requirements, new needs that should involve #bigdata

Yes - plan to transition to production - data requirements will change when research is over.

Direction of change uncertain - depends on scale up opportunities.

Interesting that optimization is a big part of #bigdata for business and #smartgrid

Big Data in B2C Customer Acquisition

Brandon Brown (@bbrownrdu) is CIO at Trident Marketing (@TridentMktg). He discussed how they use big data to help B2C brands customer acquisition by B2C brands. He went into considerable detail on their track record and plans for big data in their operations:

·         What was your initial big data project?

We're current using our Netezza data appliance to predict customer churn as well as inject some #analytics into the sales pro.

Analysis of existing customers and their propensity to churn within the first year of acquisition.

We analyzed over 30 different attributes for each customer's initial purchase and modeled them against known factors ... attempting to predict future churn for new customers with similar demographics and sales metrics.

We are a relatively new business in that scope. We began our B to C business in 2005 and have grown 50x that size in 7 years.

Our largest consumer brand is DirecTV. The are continuing to drive their partners in the pursuit of obtaining a better customer. Consumers that keep the product are, by default, the best customers to have, and require a minimum of effort for retention. We're happy that with the first round of analysis, we were able to cut our churn 50%. Amazing considering the ... amount of data we had to work with (2T)

  • What business opportunities drove you to adopt big data?

we found that using SAS with 30+ variables was a slow process and could not be real time as we have now.

Churn are the customers that do not retain your product or service for a defined time. For us, that is 1 year for 1 brand.

Retention is only one of the goals of our analytics. Long term, predicting what marketing channel has the best customers ... and dynamically routing them to the best sales agent is the long term goal. This presents the best chance we have at .... acquiring the customer from other competitors.

Now, with the churn under control, we would like to drive a higher quality customer to the sales floor and be able to .... continue to retain them long term.

  • What type of big data do you deal with and where it is located?

One consideration for the future for our call center is the ability to mine "big content" and route a caller or chat directly ... to an advanced agent on the floor for assisting in acquiring a competing brand's product.

We are doing limited mining of Facebook and other social media data.

  • What were some of the challenges you experienced in locating your big data?

likes/wants "big data", but I sure don't want to be moving big data around the network or spindles.

Only with an indatabase analytics solution would this even be possible on the current hardware infrastructures. Everyone

I have all kinds of data internally, data I can buy, but what matters, and what really drives call volume to us?

but what really matters?

Determining which demographics and order details were correlated to the question at hand. i.e. we have the data, but what matters?

  • How did you make the business justification for your initial big data project?

Ours was easy. It was more like a compliance issue with our main brand. Fix your churn, or we'll fix it for you.

i.e. Remove the marketing channel entirely, instead of selectively removing bad customers.

Churn was the first initiative, now moving that functionality to sales, utilizing big data is our next step.

And finally, at the end of a particular call, using analytics to predict another add on product or service for the customer  would be ideal. i.e. Accessing local crime data by zip and selling a security product.

eliminating the marketing expense of a secondary follow up program by predicting products.

The CMO would love a feed from FB/Twitter/etc concerning our competitors brands and negative comments.

  • Did you implement big data as enhancement to existing data analytics platform, or through an entirely new infrastructure?

We outsourced the original specs, and with the algorithms in house now, we can quickly build models with our vendor's assistance. Your analytics vendor is the key to success in this space. We used #FuzzyLogix.

Row based database technologies are no match for the future of data and analytics. Appliances like Netezza will rule the large data sets; especially if they are real time.

  • Did you implement big data entirely with in-house personnel, or did you engage professional services partners?

100% outside partners IBM/Netezza and Fuzzy Logix.

  • Did your personnel need to enhance their technical skillsets and certifications to get started with big data?

We are recruiting all the time. Big Data/Large Data Sets are what we consider at TridentMktg to be our competitive advantage.

also on the lookout for colleges jumping on the data analytics bandwagon. Our local college, North Carolina State [which] now has the country's first Masters in Advanced Analytics.

Statistical backgrounds mostly. Candidates that "get it", and why it is important to a technology company like @TridentMktg

I believe there are more programs in the works. Students from the real world all over the globe are signing up here...

I as a CIO, am not hung up on titles/degrees/certifications as much as real world experience. If you have NEVER worked with ... with large data sets, how would you know how to code algorithms to process them?

  • How do you plan to evolve your initial investment in big data?

immediately and capitalize on where our competitors are stumbling. Investment $$$ = Future Profit $$$ 10x

break even almost immediately in releasing our fund reserves, we hope to inject this into the future sales process

Since the original ROI of the system was predicted to be around 18 months, we were able to show an effectual, real world

Observations from other tweeters with big data expertise

In addition to the featured tweeters, participants on the chat included several industry notables.

Philip Russom (@prussom), research director of TDWI and noted data analytics industry analyst, offered deep commentary on most of the tweetchat questions:

#BigData hot starter projects = RFID in retail; robotics in manuf; sensors in utilities; Web logs in any industry

#TDWI Survey shows customer focus in #BigData use = ID new customer segments; sales opps; define churn; sentiment

#BigData types in most-use order: struc; semistruc; complex (hier); events; unstruc, social, weblogs, spatial

Where store/process #BigData for #Analytics? 2/3rds orgs surveyed use #EDW; 1/4 separte analytic DB; rest Hadoop etc

#TDWI Survey= 70% say #BigData is biz opp, but only if leveraged via #Analytics. Biz Value in AnApps not #BigData alone.

#EDWs offload to edge systems workloads for real time, #Analytics, source data, etc. New edge sys: #Hadoop or equiv

Speakers at #TDWI #BigData summit said few skills for #BigData & #DataScience. Career op for some, lost biz op for others.

John Mancini (@jmancini77), president of AIIM, pointed to a recent big data survey that his organization sponsored (Big Data: Extracting Value from Your Digital Landfill). Among other findings he tweeted relevant to key business drivers, first projects, and obstacles:

61% [of survey respondents] would find it “very useful” to link structured and unstructured datasets; 7% "doing" #big data, 10% planning next 12 mo, 48% planning next 2-3 years

Top use cases we see = Detecting trends and patterns, and content categorization/migration

survey says lack of in-house expertise and expense top obstacles


Helena Schwenk (@hmschwenk), an industry analyst covering information management, offered her observations on the hurdles for adopters of big data:

#social intelligenc is increasingly imp factor in VoC &CEM initiatives, but firms continue struggle w/ tech

Analytics skill shortage: some firms train existing SQL based BI professionals in new Big Data tech, but it doesn't always work

Analytics skill shortage: some firms keen to train existing SQL based BI prof in new Big Data tech, but it doesn't always work

And, of course, I, your humble tweetchat moderator and IBM big data evangelist, had several observations of my own:

·         Re Trident Marketing, whose primary business driver is customer retention:

@bbrownrdu #IBMDataChat Very smart! To the extent that you can optimize each contact (call, portal visit,etc), you streamline campaign mgt

@bbrownrdu #IBMDataChat Great. Essentially, optimizing the entire call to boost ROI & customer satisfaction in one fell swoop.

@bbrownrdu #IBMDataChat We often call that scenario "next best offer.”

  • Re PNNL, whose primary driver is intelligent electrical grid optimzation:

@PNW_SmartGrid #IBMDataChat Totally fascinating. Optimization of transactive control tech is essential to "smart" part of "smart grid"

@PNW_SmartGrid #IBMDataChat Event data. I see. Event data analysis is often core of many infrastructure optimization #bigdata initiatives

Continue the discussion & check out these resources

If you're interested in continuing this discussion, visit to engage us in the ongoing #IBMDataChat, or respond inline with comments on this blog.

David Pittman’s analysis and the Storify transcript of the tweetchat    

A recent Forbes article on PNNL's Smart Grid

More details on the Smart Grid Demonstration Project

An IBM case study on Trident Marketing:

AIIM's big data survey report

Useful big data research documents from TDWI:; 

And finally, for good measure, here's a snazzy infographic to give you a quick overview of big data.