Confidently cull information from cloud-based data

Portfolio Marketing Manager, Information Integration & Governance, IBM

Chatter about the cloud is everywhere: magazines, smartphones, television, websites and so on. We are inundated daily with messages about the cloud and cloud computing. Proponents say the cloud saves time, provides a place to store data, offers a way to manage hard drive space on smartphones and much more. Detractors come up with scary stories of hackers gaining access to bank account numbers, personal photos, and private information.

In the realm of business information technology, the conflicting stories are much the same: Will the cloud save money on balance sheets, or will the cloud expose sensitive data to unwanted prying eyes? Some analysts claim that a substantial number of businesses plan to do a majority of their computing on the cloud before the end of the decade. However, other analysts estimate the number of these organizations to be significantly less.

As with most innovations in business information technology, the ultimate truth about cloud computing lies somewhere in between. There is little doubt that cloud-based infrastructure offers an immediate opportunity for small organizations to avoid the costly outlay needed for a robust, on-premises computing environment. Data can be found, processed, and managed on the cloud without investing in any local hardware. Large organizations with mature on-premises computing infrastructures can also immediately benefit from the vast array of structured and unstructured data from cloud-based sources, particularly if they can trust the information streaming into their internal systems. Organizations today have feet in both cloud and on-premises worlds. In fact, one could easily argue that we already live in a hybrid world.

Emerging hybrid environments

Traditionally, the data lake that organizations managed existed behind their firewalls. With the cloud, most of the information in the data lake resides outside the firewall. Today, organizations are trying to figure out how to manage the new hybrid computing and data management model.

Hybrid simply means a mixture of public cloud and on-premises data sources and computing in support of business operations (see figure). But here’s a secret: no one sets out to plan a hybrid environment; it just happens.

Consider this scenario. A busy marketing executive wants to understand whether a new campaign is influencing customer sentiment. She is in the midst of a large media buy, and really needs to know if the radio and TV ads are having any impact. Given her limited resources, pulling the plug on a bad campaign may save a few million dollars—money that could be reinvested in other marketing activities.

She tasks her analysts with figuring out a way to gauge customer sentiment before the actual sales numbers start coming in. Her team goes to third-party sources such as Twitter and raw retail scanner data that is fed into their own analytical tools to create a model of customer sentiment impact. All this research, data collection, data integration, data processing, and data analysis takes place without any IT involvement.

Assuming the executive gets a satisfactory answer, regardless of whether the news is good or bad, she will be pleased by her team’s speed in assembling data and producing information without IT participation. She will return to this approach again in the future.

The organization in this scenario is in hybrid mode. IT is hosting on-premises sales information that is reported on a monthly basis, and marketing is curating data from the cloud to spot trends in customer sentiment changes in near-real time. This process works very well for the marketing executive; however, IT worries about security and scalability. What if sales and operations want to emulate marketing’s approach? Will IT be forced to support these multiple ad hoc projects? The business units want speed and flexibility, while IT wants scalability and security. Can these competing interests be unified?

Owning strategic information

Adopting a hybrid environment does not imply having an IT strategy completely worked out. In fact, cloud-based aspects of the environment may evolve rapidly in response to business priorities. However, even if only a small percentage of data is flowing in from cloud-based sources, IT does need a plan for data integration. IT needs to help the organization ensure it owns the information created from all data and processing, no matter where it is located.

The hybrid infrastructure and decentralized computing are merely means to the ultimate end of creating strategic information assets. Embracing this fundamental notion lends clarity to what IT should be concerned with, and importantly, how IT can more effectively partner with the business users.

How can organizations realize the obvious financial benefits of the cloud while ensuring information culled from cloud sources is secure and trustworthy? The answer is governance. Good hybrid information governance implies several priorities for IT and the business.

  • Broad agreement on what information means: This agreement includes common terminology, policies and plain-language rules for what the business needs and how information is handled.
  • Clear agreement on how owned information assets are maintained and monitored: One example is the use of operational data quality rules to master data management in on-premises systems.
  • Enterprise and departmental standard practices for securing and protecting strategic information assets: Such practices include articulating role-based access to information, creating rules governing how information is shared and protecting sensitive information from third parties.
  • Enterprise data–integration strategy that includes lifecycle management: The strategy means architecting how data flows and is assembled into strategic information, and also understanding how that data and information is maintained over time.

These priorities form the foundations of information governance in a hybrid environment. In each case, a blend of process, organizational and technical enablers are needed to make it work. With these pillars in place, organizations have the flexibility to move with speed and confidence.

Charting the strategy’s course

Many organizations are possibly already operating in a hybrid environment. Cloud-based data and processing services present too much opportunity for business users to ignore, and IT is charged with maintaining the integrity of internal, on-premises transactional and reporting systems. Charting a governance strategy is not something to consider at a future date. It needs to happen now.

This article is an excerpt from the recent ebook, The Truth About Information Governance and the Cloud, IBM Software Group, January 2015.

Learn more about the importance of information integration and governance.