InsightOut: Enabling a highly collaborative and data-driven organization

IBM Fellow and VP, CTO for Information & Analytics Group, IBM

The first installment of this series discussed the challenges of multispeed IT. Multispeed IT has to do with the challenges inherent in increasing accessible data and allowing for more self-service and innovation while at that same time maintaining trust and governance to drive the business.

The challenge is real. More organizations than ever are aiming to be data driven by not only enabling analytics more pervasively across the enterprise, but also allowing for more collaboration and interaction with the data. How do they get there? They begin by understanding the different roles in the organization that consume or work with data, their needs, their experience and the challenges that can arise with their IT counterparts.

New modernization strategies are pushing us to focus the design of the analytics architecture from the top down, hence through the lens of the consumers of the data. This approach is a departure from the past in which we focused on the containers of data and then provided access—a bottom-up design. Today, chief data officers (CDOs), chief information officers (CIOs) and data engineers need to have a clear understanding of how information flows and how interactions enable increased collaboration across the enterprise. This top-down focus is the foundation for IBM’s technical and product strategy because it allows our clients to enable self-service, innovate rapidly and collaborate across the enterprise.

Key data consumers

Throughout its many client engagements, IBM saw a pattern of key roles emerging. Although there are certainly many more consumers of data and insight, these roles are seen as critical in the path to enabling self-service and collaboration. 

  • Knowledge worker and citizen analyst: Business analysts who are subject matter experts (SMEs) in the actual business are demanding a highly agile self-service model that allows them to find information relevant to a topic they are analyzing and discover insight in that data.
  • Data scientist: Typically, data scientists are trained in a quantitative discipline such as statistics, operations research, machine learning, econometrics or an equivalent field. They have a deep understanding of the mathematical and computational methods that can be applied to data to derive insight for the business process. Data scientists discover insight in a rich set of data, and this discovery includes preparing, cleansing and enriching data from different sources, both internally and externally.
  • Application developer: These individuals incorporate actual analytics algorithms into an application, often in the shape of scoring functions, that will be integrated with a business process run on a production-level system and data model.
  • Data engineer: As a traditional IT persona who manages the data, data engineers build physical and logical models. They are also responsible for the integration tasks to bring capabilities defined by data scientists, application developers and CDOs into production-level business processes.
  • CDO: As executive-level owners of an organization’s data,CDOs define the logical business-object models and governance rules including data access policies. They are ultimately responsible for the quality of the data.

New data lifecycle

How do these roles discover, consume and gain insight from data? In this new world that is governed by a high demand for self-service and analytics, we see a new lifecycle of data emerging in which the data flows through the different phases of a data-driven organization.

Phase 1 – Asset discovery: This phase involves the process of shopping for assets, or asset discovery, relevant to a specific question that is being posed. Assets can range among data, transformations to produce information from other information, analytics models, scoring functions, reports and so on.

Phase 2 – Initial access of information: In this phase, information is accessed and first-level cleansing of information is performed to ensure the quality and consistency of information. Data lands either in a general discovery sandbox or a user- or task-specific sandbox.

Phase 3 – Insight discovery: This phase is an iterative process that repeats the following steps until analytics models are validated and re-validated: 

  • Integration and further cleansing of data into a data model representation, which can be and often is a simple tabular representation
  • Feature extraction and dimension reduction to further refine data into a form that is conducive to the analytics and online analytical processing (OLAP) task
  • Model development, training and validation against the prepared data set for which the model development can be automated using algorithms—for example, statistics and machine learning—or manual—for example, entity extraction, business rules and so on 

Phase 4 – Insight deployment: In this phase, insight that was derived from the discovery phase is deployed into production-level business processes: 

  • When deployed in an operational system, the analytics model is—or the rules that are derived from the model are—applied repeatedly to new, operational data.
  • Monitor the business impact ad hoc analytics or the repeated use of the analytics model in an operational system.
  • Impact can be used to trigger a new cycle—for example, whether the accuracy of an operational analytics model is decreasing over time.
  • Not every discovery will lead to a deployment—probably only 10 percent of the insight that is discovered is going to be of repeatable value to the overall business process.

Phase 5 – Insight lifecycle management: This phase entails the retirement and archiving of the analytics models and potentially the associated data models. All analytics models should have an expiration date upon which the model should either be refreshed or retired.

Two flows

One other dimension to this approach is vital. Two distinct flows are associated with the information and analytics lifecycle, and the marriage of these two flows is a significant challenge today.

The first flow, and the one that has really been the main emphasis here, is associated with discovery and is where knowledge workers, data scientists and application programmers will spend the majority of their time. The focus is on discovering interesting insight, building models, creating reports and so forth—and doing so at the speed of business.

Traditional IT comes into play in the second flow, which takes the repeatable, high-value insights, reports and so on and builds them into an operational business process. These systems will have tight service-level agreement (SLA) demands and high governance and regulatory demands.

Role interaction to come

The next InsightOut series installment takes a closer look at each of these key roles. Specifically, it will offer additional depth into how they interact within the flow of information and analytics in organizations. Beyond the interaction aspect, subsequent perspectives will take an even deeper dive into the respective roles in regard to their diverse challenges, functions and flows within a self-service, multispeed IT environment.

Continue your exploration of the technologies available on the IBM analytics technology platform discussed here. Click here to see the next installment of this series or here to view the entire series.