Organizations know that there are insights hidden within data that can be the key to uncovering new sources of competitive advantage. Some of the data is proprietary and some is public, or quasi-public. Some of that data is traditional structured data, and some is unstructured, unformatted. At the same time, the sheer volume of data and the speed with which new data and data sources become available is ever increasing. Organizations are under pressure to make better business decisions in less time, with more transparency, and less risk. They are sensitive about big brother concerns, so they want to handle data with care. These are the main challenges of the era of big data, and most organizations are dealing with big data in one way or another. For that reason, organizations are investing in new analytical technologies and information-centric roles such as chief data officers and data scientists, all with the ultimate charter to make order from the chaos, to find the hidden insights and to grow shareholder value, while protecting consumers
According to a recent Forbes article, “…years into the era of data scientists, most practitioners report that their primary occupation is still obtaining and cleaning data sets. This forms 80 percent of the work required before the much-publicized investigational skill of the data scientist can be put to use.” Put simply, the vast majority of the time spent by the people who organizations hire to make order from the data chaos is spent figuring out what data they need, how to get that data, how to prepare that data for analysis and later defending the analysis. Why do they do this? Because without it, the organization would have no confidence in any information being produced. Therefore defining, assembling, preparing and defending data are necessary prerequisites for having confidence in information. What if someone could offer a solution to flip that metric—a solution that would allow data scientists and chief data officers to spend only 20 percent of their time gathering and preparing data? Another sticky problem is the simple fact that data can be (and has been) misused, and even stolen. With massive data volumes and velocities, data safety is a real concern. Ultimately, these are the two problems that need to be solved: productivity and safety.
Given volume, variety and velocity, the productivity problem is caused by a lack of context for data analysis. Context means having a business framework for determining how and where to use data. Context will be highly situational and variable, so there is another critical factor: agility. Agility means having the flexibility to establish and maintain context independent of the volume, variety and velocity of data. Scaling up should be as easy as scaling down.
Additionally, when organizations use big data, how can they ensure they are using it properly (safely)? Appropriate controls and data handling practices need to be in place to help ensure data are safe and secure.
IBM Big Data Governance is a holistic approach for enhancing data productivity and security. This approach is not simply a bundle of technologies. There are key process, organizational and technical components. IBDG is not a one-size-fits-all program of bureaucratic procedures and expensive technologies. There are a myriad of governance tools and techniques; however, any tool is more or less important, depending on the type of project, business objectives and other contextual elements. Read The IBM Agile Information Governance Process whitepaper for more information on IBM’s approach to big data governance and learn more about IBM’s Information Integration and Governance solutions.