Getting past government stereotypes with analytics
Having worked in the federal government for a long time, I began to see the same patterns over and over again. Department of Defense (DoD), Department of Energy (DoE), Department of Homeland Security (DHS), Drug Enforcement Agency (DEA), education, Federal Bureau of Prisons (BOP), housing, Treasury Department, tribes, US Department of Agriculture, US Food and Drug Administration, US Postal Service—you name it, they fall into the same stereotypes every time:
- Too far behind the commercial sector
- Not taking advantage of data
- Technology is outdated
If these stereotypes are true, they’re not the result of a government that is not forward thinking or that doesn’t care. The reason these stereotypes exist is because these government sector processes are never easy. No commercial off-the-shelf (COTS) solution ever works correctly out of the box. It may not meet security standards, and everything needs to be planned out years in advance.
The long road ahead
Consider one example. A senior executive service (SES), chief information officer (CIO), visionary at the top demands that my organization start using analytics. Where do I start? Well, I can start with data. Because of the way government works, I need to allocate funds for programs and projects ahead of time. A necessary evil from this tact is the silo-ization of data, not just for monetary and planning purposes, but also for privacy impact, application performance and legal impact.
The data I have been managing is now spread across the organization, and no one wants to share it. If I’m lucky, I just run with my own data to which I may have access, and I want to build a data warehouse. After sending out a request for information (RFI), everyone on the planet responds with “better, faster, cheaper.” In the end, I select something safe because no one ever gets fired for buying something safe. The next six months are then spent arguing with the data center over networks, floor space and security.
I then have to pay someone to figure out what my data looks like and how it got that way, so that it can be properly modeled for the database. That process takes six months, but I finally end up with a reasonably good data model. Now someone tells me I have to load it into the database—that process is another six months. I then have to find someone who knows how to write an extract, transform and load (ETL) process, write up a new services contract, get that person cleared and finally I’m starting to get somewhere.
Then the question, “How exactly do you want to use the data?” arises. After figuring out what my question is and receiving the answer, I find out that someone else had already answered the question, but that person’s answer is different. Turns out in that case a different tool was used on a slightly different version of the data that was pulled down to that someone’s workstation. Now I have to justify that discrepancy. My boss asks if I’m using machine learning. I’m not sure, but I doubt it.
Now I’ve got to find one of those data scientist types, whom I’m going to pay a lot of money to recreate everything I just did previously in an arcane statistical language I don’t care to learn. The data scientist comes back to me with results, but tells me it's too complicated to explain how they were obtained.
The boss calls again and says that the question we just paid the data scientist to work on is now irrelevant, and that I should be doing analytics with Twitter. Wait, what? Why?
I dutifully load the Twitter data into the database, only to figure out that there’s not much I can actually do with it there. I make more requests for proposals (RFPs), ask more questions and apply less analytics. Every vendor responds with “better, faster, cheaper.” But don’t worry, we’re going to the cloud.
A problematic pattern
Are you starting to see a pattern here? There are many problems with developing a high-performance analytics organization in the federal government—and I suspect in industry at large. I’d classify these problems as follows:
- Infrastructure and software
What IBM is doing is helping with the first three problem classifications: infrastructure and software, security and agility. To solve the infrastructure and software problem, there are software appliances, hardware appliances and cloud solutions. If you can get to cloud computing, you can’t beat the model. It’s fast, secure and elastic. Universally, we’re all grappling with shrinking IT budgets, and I think we’ll all end up in the cloud eventually.
Security is always a tough discussion, even with the Federal Risk and Authorization Management Program (FedRAMP), International Organization for Standardization (ISO) 20007, Service Organization Controls (SOC) 2, Federal Information Security Act (FISMA) High systems and so on. IBM is going above and beyond federal security requirements with technologies that protect data where it’s most sensitive, obfuscating data where it can and tracking everything about who downloads it, where it came from and who owns it. Unlike other cloud-based environments, I can actually point you to the exact disk drive where your data sits encrypted and separate from other workloads.
With the IBM Bluemix platform, I can spin up a data warehouse in minutes—literally minutes. No tuning or pre-aggregation or database administrator (DBA) work is required. Data marts in minutes—someone should trademark that phrase. Need an Apache Spark cluster? Give me five minutes. Apache Hadoop? MySQL? How about a MongoDB instance? That instance is not even an IBM product, and yet, there it is—in minutes. I pay for it monthly or by how much I use it. Bye-bye, shelfware.
An analytics-driven government
Now, I bet you’re thinking, “OK, but you can’t solve my human problem.” Well, a definition is needed before we can proceed. Internal politics perhaps? I probably can’t help there, but I can help you fail faster.
How about specialized knowledge? Turns out I can do something around line-of-business users. IBM understands technology and analytics. If I leverage that know-how to eliminate most of the busy work of analytics, I can get down to real machine learning without a single line of code. The IBM Watson Analytics platform enables me to drop a spreadsheet onto my web browser, and minutes later I’m developing a predictive fraud model I can share across the organization. It tells me how to fix my data quality, which fields are important, which fields are predictive and how to best move them from point A to point B.
Assume I want to actually get out there and educate myself at BigDataUniversity.com. Huge libraries of tutorials on NoSQL, Spark, Hadoop and more are available. All of them are self-paced, complimentary and real. But if you don’t have time for this approach, ask someone at IBM to help you with a pilot—someone who can stand up the entire secure, cloud-based infrastructure in hours and have a real analytics discussion in days.
Not all the world’s problems can be solved. Even with the previously discussed approach, things can take time. But if we can get to the point where we can try new things quickly, fail fast and only pay for what we use, we’re talking about a much more analytics-driven government.
Government sector expertise
IBM brings together experts in the government sector at the Government Analytics Forum, 5 May 2016, at the W Hotel in Washington, D.C. These experts are on tap to discuss the challenges of applying advanced analytics to enhance mission outcomes, reduce risks and operate highly efficiently.