A journey to Hadoop: How my vacation planning can improve government efficiency
I recently took a road trip with my family to one of my favorite areas of the United States: the Four Corners area of the southwest. We visited the Grand Canyon, Durango, Arches National Park and Bryce Canyon National Park. Huge expanses of incredibly shaped sandstone in hues of red and pink with massive canyons and oddly shaped “goblins.” It was a wonderful trip.
While I was planning the itinerary, figuring out where to go and where to stay, I, like most people, read travel books and researched online. I read descriptions of the various places, looked at photographs, watched videos, utilized sites like Yelp to check lodging reviews and used a smartphone app to check the temperature forecasts for various places—yikes! 112 degrees in Moab!
In short, I used a wide variety of text, photos, videos and social media to determine the best parks, routes and lodging for each part of the trip. Most of us don’t think twice about this kind of decision-making process—it has become second nature. Whether we’re buying a TV, trading in a car or moving to a new city, we rely on these various data sources to ensure we make the most informed choices. Yet, none of this data is what’s considered traditional, structured data that can be stored and accessed in rows and columns. This type of data is what’s called unstructured (such as photos, video and sensor data) or semi-structured (clickstreams and email for example). Fortunately, a web browser and search engine provides the means for me to access much of that data in a rather seamless fashion.
But, if you think about how most government agencies (and for that matter, most businesses) gain insight and make decisions, you’ll see that most of the time, the decisions don’t have the benefit of this kind of data. Wouldn’t it be useful for a social services agency to be able to query all available data about a person and their programs (including case worker and contact center notes, email and chat transcripts) to help determine the best course of action for a client? Wouldn’t a tax or social services fraud investigation be much more effective if the investigator was able to make connections between people, programs and events by having access to social media, case notes, photographs, email and clickstream data? How about a city public works department having access to citizen’s social media and other online information complaining about potholes, leaky fire hydrants, graffiti or other city issues? Or a police department being able to use all relevant data about people, places and events in their crime investigations? Or consider a government contact center customer service representative (CSR) having access to all relevant information to help the caller?
Until the last few years, it hasn’t been possible to efficiently and cost-effectively store these kinds of data in a way that makes it easy to combine with traditional data and also be easily accessed. This changed with the advent of Hadoop: an open source platform for the storage, analysis and retrieval of both structured, semi-structured and unstructured data. Using low-cost, commodity servers, Hadoop enables, for the first time, the creation of a comprehensive data pool, consisting of a huge variety of types of data and the means to query, explore and gain insight from these data together.
While Hadoop is beginning to take hold in government, some agencies are already making significant use of the technology, particularly in the national security and defense arena, where the challenge of connecting dots and resolving identities and relationships require massive varieties and amounts of data. There are numerous examples of how government agencies are using IBM’s Hadoop distribution, InfoSphere BigInsights, to tackle these issues:
- National security: A national security agency has combined Hadoop and MDM (master data management), to find persons of interest in multiple suspect files requiring millions of searches against billions of records daily. This solution is facilitating analysis of massive data sets involving persons.
- Government medical: A provincial health bureau in China uses InfoSphere BigInsights to store and analyze radiology images. The system is able to run very compute-intensive medical imaging algorithms to significantly improve patient healthcare by enabling physicians to exploit the experience of other physicians in treating similar cases, and inferring the prognosis and outcomes of treatments.
- Research and economic vitality: Governments in Belgium and the UK have set up big data and analytics hubs to boost the local economies by enabling researchers, businesses, hospitals, medical companies and others to experiment, model and develop a variety of solutions to benefit citizens and the community.
- Citizen sentiment: Hadoop is being used to help gauge citizen sentiment, attitudes and preferences through analysis of social media. As you might imagine, political campaigns are making great use of this type of analysis.
- Cyber security: The addition of InfoSphere BigInsights to cyber security technology, such as IBM’s QRadar, improves the effectiveness of combating cybercrime.
The art of the possible
The use of Hadoop is limited only by the imagination of government agencies and their partners. Over the next few years, Hadoop is going to fuel a huge overhaul and modernization of the way government decision-making is made, resulting in a more responsive, efficient government.
Is it time for you to begin thinking how it might improve your agency?
Here are a few links to explore Hadoop further:
- Website: IBM InfoSphere Big Insights
- eBook: Enterprise Class Hadoop and Streaming Data
- YouTube Video: InfoSphere BigInsights for Hadoop Quick Start Edition
- IBM InfoSphere Streams: Redefining Real Time Analytics