InsightOut: Role-based interaction with the information and analytics lifecycle

IBM Fellow and VP, CTO for Information & Analytics Group, IBM

In a previous blog, we introduced five roles that are integral to the data lifecycle. Before we discuss self-service, let’s broadly sketch how each role interacts with data, uncovering and deploying insights as it explores organizational data. For more information about these roles, keep an eye out for future blogs dedicated to each role individually.

Citizen analysts/knowledge workers

A knowledge worker is primarily a subject-matter expert (SME) in a specific area of business—for example, a business analyst focused on risk or fraud, a marketing analyst aiming to build out new offers or someone who works to drive efficiencies into the supply chain. These users do not know where or how data is stored, nor do they know specifically how to access data in the myriad of repositories that store it. They do not know how to build an ETL flow or a machine learning algorithm—nor do they want to know. They simply want to access information on demand, driving analysis from their base of expertise without depending on individuals who have deeper technical expertise.

These users begin with a problem—a hypothesis or question—or perhaps data in which they hope to uncover insights. They must first identify data, or assets created around data, such as reports, that will be relevant to their task—increasingly referred to as shopping for data. After identifying information of interest, such a user must then access the information, combining and shaping the data into a form amenable to the task at hand. This task, which often involves provisioning raw information into a data lake or a sandbox environment, is called data preparation, or data wrangling.

After preparing the data, the user can begin searching for insights in the data itself. In a cognitive world, users can interact with the information by asking questions in natural language or the language of business. Cognitive capabilities can also help guide them to insights by offering predictive indicators or by visualizing the data for inspection. This is the magic of cognitive computing—when technology not only provides users with data to help them make informed decisions but also guides them to the insights.

Knowledge workers sometimes engage with data scientists who need unusually deep analytics capabilities, often during the discovery phase or after task completion, when subsequent data scientists build on their work to create a model as part of an ongoing process. Knowledge workers may also work with data engineers to set up a temporary environment in which to further explore information not previously integrated into their work, allowing any further discoveries to be put into production as well. Finally, they may ask data engineers to help them find and integrate information as a starting sandbox environment—or even to use the results of their work to build a data model and data integration pipeline for deployment into a production environment.

Data scientists

Data scientists work in much the same ways as knowledge workers, also focusing on discovery and model development—but their level of knowledge, as well as the detail of their work, strikes a sharp contrast. Although data scientists have less domain knowledge than knowledge workers do, they can convert real business problems into data science problems, using analytical techniques to uncover insights and business-driven solutions. Not surprisingly, then, their interactions with data after finding and shaping it are fundamentally different than those of knowledge workers.

Much like knowledge workers, data scientists start with the shopping experience. Unlike their counterparts, however, they are looking not only for data but also for analytics assets—models, algorithms and the like. (Note, though, that as analytics becomes ever more pervasive and daily more consumable, knowledge workers and even application developers may also begin discovering and identifying reusable analytics assets.) Data scientists can perform both simple and complex data preparation tasks, including feature extraction and dimension reduction for data refinement, implementation of analytics models and machine learning development, and model training and tuning. Accordingly, data scientists spend the majority of their time gathering and preparing data.

Data scientists help bridge real business and mathematical models, looking behind the data to derive insights and develop business-driven solutions using analytical techniques and problem decision support. Data scientists also manage the lifecycles of analytics models, especially those deployed into production systems in support of business processes. Most such models, however, should come with a date of expiry, especially those not equipped with a feedback loop designed to heighten the model’s intelligence over time—and setting such a date is also a data scientist’s job.

Data scientists often collaborate with citizen analysts to aid self-service discovery, handing back models that can be used in that discovery. Similarly, they work with data engineers to deploy analytics algorithms and applications into production systems.

Application developers

Application developers are responsible for making analytics algorithms actionable within a business process, generally supported by a production system. Beginning with the analytics algorithms built by citizen analysts or data scientists, they work with the final data model representation created by data engineers, building an application that ties into the overall business process. Application developers also enforce governance rules, preserving trust in the application.

In what should come as no surprise, developers also take part in a shopping experience, searching for assets that can they can compose into the application they are building. When deploying an application, they often collaborate with data engineers to ensure the presence of a dependent data model and of the integration flows needed to produce that model.

Data engineers engineers develop data structures and models to the specifications set by knowledge workers and data scientists and by doing so act as the central nervous system for the movement of data in the organization. Moreover, they develop data integration tasks to populate data models, facilitating data quality analysis on source systems with an eye to monitoring and driving quality improvements to the system. They also enable data archiving as analytics models and applications retire and as data becomes stale.

What’s more, data engineers work with the chief data officer (CDO) to establish linkages between business and technical metadata while ensuring that governance rules are enforced. Data engineers also build data integration tasks to populate the core landing area of a discovery zone while building consumable integrated views of data for the knowledge workers in that zone, producing final integration flows that support production-level systems.

Chief data officers

The CDO uses data and analytics to drive business value. Indeed, CDOs use metadata as their data, allowing them to catalog, measure and assess the quality, value and usefulness of available data and analytics from the organization’s point of view. Thus they drive investment in information governance initiatives, promoting data quality as well as definition of business terms and business objects by selected SMEs. CDOs are also responsible for legal and appropriate use of data and analytics, drawing on their understanding of the regulatory requirements, legal obligations and brand promises that bound an organization’s use of data and analytics.

Thus business value and appropriate operations intersect in the policies, classifications and governance rules defined by CDOs, and CDOs then enhance these definitions in an iterative process through close collaboration with the data owners in the organization as well as those in the knowledge worker and data scientist communities. A CDO’s team thus works with data owners, data engineers and application developers to automate data quality, data protection and information lifecycle management rules, facilitating operations in the organization and among its IT systems. By doing so, they aim to give knowledge workers and data scientists ready access to as much data as possible while mitigating fears of policy violations or insufficient data quality.

A CDO’s team adopts a strong metadata strategy to help it own and drive asset catalog population. Such catalogs are core elements in the shopping experience already described but are also much, much more, containing information on the provenance and lineage of information—including not only the information supply chain flow but also detailed metadata for all steps in this flow: data characteristics, ownership, currency and the like.

Strengthening organizational roles through collaboration

Each of these roles acts and interacts across the data and analytics lifecycle, sharing certain requirements and capabilities with others—for example, shopping for data and analytics assets, trust in data and self-service. However, each role’s interactions with data are dictated not only by the task being executed but also by the skill of the person performing the role. Moreover, although the goals of various roles often overlap, their associated skill sets can be very different. Accordingly, collaboration capabilities across roles are becoming ever more critical, especially when moving discoveries into production.

To learn more, find out how IBM data analytics technologies can help you heighten collaboration among roles, moving data fluidly through its lifecycle. Click here to see the next installment of this series or here to view the entire series.