Takeaways from Recent ZDNet TechLines Broadcast Panel on Big Data

Big Data Evangelist, IBM

Big data is everybody’s priority these days, and it’s always exciting to see what different organizations are doing with it.

A few weeks ago, I had the pleasure of participating in a live streaming video panel on business applications of big data in various industries. The broadcast, sponsored by IBM and hosted by ZDNet from the Jerome L. Greene Performance Space in New York City, brought together experts from NASA, Ford, T-Mobile and Archimedes, plus your humble IBM big data evangelist. Moderating the event was ZDNet editor-in-chief Larry Dignan.

What follows are the mental notes I took of the chief points that other panelists made (organized left to right on the panel; I’ll follow these with the chief points that I myself, at the far end from Dignan, put across):

Optimizing network utilization and customer experience at the same time demands powerful high-velocity analytics

Christine Twiford, T-Mobile manager of network technology solutions, said the wireless carrier makes heavy use of clickstream analytics on customer video-streaming behavior. She said their data scientists use these analytics in support of more efficient network capacity utilization and superior customer experience. “We learned that people only watch the first 10 seconds of a YouTube video before they decide they’re going to watch it or not. So we only cache 15 seconds, maybe, instead of 30.” She said these analytics offered assurance that the carrier could offer unlimited data plans without overloading their network. They can only keep 10 days of clickstream data, she said, so they’re trying to process more of this information in flight while relying on smarter sampling approaches. [Watch this video for more from Twiford on T-Mobile's use of big data.]

Deep archival data volumes can be your ace in the hole for business process optimization

Michael Cavaretta, Ford Motor Company’s technical leader for predictive analytics and data mining, research and innovation group, said the manufacturer has a wealth of historical data that powers many big data projects. “A lot of people have data going back dozens of years on processes, as well as streaming from the vehicles. We’re swimming in it,” he said. “We have a saying in my group: You cannot give me too much data.” Storage constraints are important to consider, he said, but it’s critical not to lose valuable historical data. “People come to me and say ‘Why would we want to store more than 90 days worth of data? That’s just silly.’ ‘OK,’ I say, ‘what happened last year?’ That’s the problem.” Going forward, he said, Ford will leverage that historical data, supplemented by streaming real-time information from manufacturing processes.

Patterns in complex data sets can reveal urgent issues requiring immediate action

Katrina Montinola, Archimedes Inc. vice president of engineering, is responsible for healthcare simulation software used in clinical trials. She pointed to the Vioxx story as “the best example in recent of big data affecting healthcare in recent years...It was a large Kaiser Permanente data set that led to the discovery that Vioxx had these adverse effects, and subsequently they pulled it out of the market.” Other drivers in healthcare analytics adoption are both the carrot of government funding and the stick of regulations. “In 2009, the government awarded $60 billion over 5 years to encourage the meaningful use of EMRs (electronic medical records). So a lot of patient data is now coming online. And the Affordable Care Act has a provision in it for Medicare to share in the savings that healthcare organizations get as long as they deliver quality care and reduce costs.”

Open sharing and crowdsourcing of data science talent may unlock potential insights from disparate data sets

Nicholas Skytland, program manager of the Open Government Initiative at the US National Aeronautics and Space Administration (NASA), said advanced visualization tools can democratize data science and perhaps alleviate the talent crunch. He pointed to crowdsourcing as encouraging the development of “citizen scientists.” He said opening access to NASA’s huge, ever-growing data set is key to realizing that dream. He said the agency is planning missions that will collect 24 terabytes of data a day, and is relying on public cloud services to enable greater openness in its dissemination.

What did I have to offer? Well, various things, but the key point I’m glad I put across is this:

Whole-population analytics is the most revolutionary new exploratory approach enabled by big data

What this means is that, as storage costs plummet and processing power becomes cheap and ubiquitous, you can do deep, continuous analysis against the entire population of data, rather than just the traditional capacity-constrained samples/subsets. Being able to drill into the entire aggregated population of, say, customer data, including rich real-time behavioral data, enables you to do more powerful micro-segmentation, fine-grained target marketing, nuanced customer experience optimization, and next best action. Storing petabytes of data and having it accessible in real time means you can learn a lot about your customers, gaining an “X-ray view” of what’s going on inside their heads (experiences, sentiments, propensities). You can also gain correlated insights on the past, present and future.

I want to thank the ZDNet and CBSinteractive teams for their excellent hosting and production of this stimulating event. I personally learned a thing or two, which is my core criterion for a worthwhile session of anything.

Which of these key points resonates most with you? Leave me a comment.