US Federal Government and Big Data
I tuned into the recent US federal government’s web-cast. I was curious to see what a government big data initiative would look like, as my government (UK) is unlikely to produce anything equivalent any time soon. They and their opponents are too busy trying to prove that they’re not entirely removed from the concerns of the ordinary citizen.
There’s been a great deal said in response to the webcast already, so I wasn’t sure I wanted to add to it, especially as my first impression was that it was a bit kitchen sink - a panel of speakers from many different branches of government and academe, not to mention name checks for other big data users (e.g. NASA) who weren’t there. And some of them were talking about projects that had been drawn under the umbrella of big data, rather than spawned as a new recognition of the importance of big data. But as the event went on I warmed to it. The speakers reflected a consistent understanding of the challenges and opportunities of big data and there were some pretty imaginative projects.
The highlight for me was the Stanford online education initiative that Prof. Daphne Koller spoke about. I really liked her idea of analyzing data gathered from how students use the available material online to better understand how people learn. IBM has launched an analogous initiative to provide online training in big data technologies and techniques (big data university). I’m not sure if we’re analyzing the learning experience data in the way that Stanford are but I’ll be finding out. Either way it is recognition that big data isn’t just about cheaper storage and more scalable computing. It’s really about what you do with the data – it’s the analytics. And that means two things. It means addressing the skills gap (Steve Lohr, of the New York Times, who moderated a panel at the event wrote about that in February) and it means technology to raise the productivity and increase the effectiveness of data scientists. That’s why IBM, unlike some established technology vendors, emphasizes new analytics and data science skills for big data, in part, it’s why we’re funding the big data university.
But of course there’s the other side of closing a skills gap – making the people who have the skills more productive. So for example the IBM BigInsights product, based on open source Hadoop (good for open standards, good for exploiting a growing skills base), with a bunch of value-add features which I’m not going to go into here (left as an exercise to the reader). A key one is BigSheets that provides really easy-to-use visualization of analytic results sets. University of Southern California (USC) grad students used BigSheets and text analytics (another critical feature of BigInsights) to understand how the Twittersphere perceives up-and-coming movie releases. They built their movie predictor really quickly and easily and it shows some great results. They predicted the success of Hangover II in the face of widespread critical and media doubt (fact-based decision making prevailing over intuition-based decision making, even by experts). But maybe the LA Times thought an arthouse movie like Hangover II wouldn’t resonate with their readers so they picked the Harry Potter week in the story above.
As John Holdren, Assistant to the President for Science and Technology, said in his introduction to the event, ‘Collecting information is useless without ways to analyze and understand’. And as William Brinkman, Director of the Office of Science at the Department of Energy added later ‘Our challenge is not high-performance computing, it’s high-performance people’. I’ll just add: and also high performance tools.
For more information: