AI Governance: Drive compliance, efficiency and outcomes from your AI lifecycle

Offering Manager, Watson Data & AI, IBM
Offering Manager, Watson Data & AI, IBM

Artificial Intelligence has penetrated every industry in some form or another. From powering recommendation engines for consumer products to helping extend credit products in a more efficient manner, AI is becoming an imperative that no C-level executive can choose to delay. Even amid the COVID-19 pandemic in the last few months, we have seen encouraging use of AI for tracking the spread of disease as well as accelerating the discovery of vaccines.

As businesses start to scale the use of AI as a transformative power to innovate and be more efficient, they have to manage the risks that come from it. Specifically, when dealing with sensitive customer data and in regulated industries, governance is a mandatory aspect of operations. However, as AI becomes more prevalent there are new gaps which need to be addressed in governing the lifecycle of data as well as the models trained on those data. At the same time, governance processes should not impede the iterative nature of data science experiments that help build and operate AI applications.

Governing for control

In the data management space, governance processes are important to comply with enterprise or industry regulations. They are also important to protect sensitive customer data, loss of which can invite financial as well as reputational damage to a brand. The same extends to AI models which may exhibit behavior that may be unfair or downright harmful for consumers.

While the best practices to govern data have been improved over the years, we need similar best practices for models. The additional complexity of governing models is that they are frequently trained; as a result there are a number of versions of the model and corresponding data sets on which they are trained. The provenance of data, models and the associated metadata of any glue code and pipelines have to be traced and documented for audits. In addition, it is important to document the techniques used to train the model, the hyperparameters used, the metrics from testing phases etc. in order to provide complete transparency of the model’s behavior. Before the model is pushed into production, they have to be validated by an independent group in order to evaluate the risks to business. When they are in production, they have to be continuously monitored for fairness, quality, drift as well as provide easy to use explanations of the predictions.

A side effect of this requirement is that data scientists and model operations teams now have to create an extensive set of documents to describe the model. According to Brandon Purcell, Principal Analyst at Forrester, explainable AI isn’t just in explaining each output and how it was determined. It also requires explaining how the AI model was built, what data was used, whether you can trust that data, if it was biased, if it complied with policies and regulations, and ensure it’s in production only for its intended use. 

One financial services company that we spoke to writes an approximately 40 pages long document for every model that needs to be pushed into production. In order to be effective, the collection of this information and documentation needs to be automated. Apart from documentation, active enforcement of policies and rules are required in order to ensure that models exhibiting biased behavior do not go into production and do not lead to unfavorable outcomes. Even before models are developed these policies and rules should prevent the use of bad data through data quality checks and actively preventing use to build models.

Coordinated actions to track models, associated data and metadata across the lifecycle automatically, are thus imperative to be prepared to mitigate business risks from unwanted model behavior as well as be prepared for compliance audits.

Governing for efficiency and outcomes

While the importance of data governance as a practice is clear from the perspective of compliance and risk management, in the AI lifecycle it can also help to improve process efficiency.

Data science practitioners experiment with a large number of models before converging on a few that will drive the required business outcomes. Through that process, they access, combine and transform a large amount of data. During the hyperparameter optimization phase they experiment with a wide variation of model training parameters. Reproducibility of all this information and experiments is key for successful deployment, collaboration and for further enhancement of these models. Another characteristic of data science teams is the talent pool – there is a significant churn in this space due to high demand in the market; another issue is new data scientists joining a team or ones who are not as experienced may need to take these models and deploy or extend them. Without detailed information on data, the transformations applied to them, model training experiments etc., it is impossible for them to be efficient.

Fortunately, metadata captured through the governance process can be used for knowledge management to improve the efficiency of data science teams. You could imagine comparing the outcomes of several different models and then tracing back to the data, modeling techniques and experiments to reproduce a similar process for more models. This not only helps a DS team manager stay sane but also codifies and operationalizes best practices through tools.

IBM is innovating to enhance governance in the AI lifecycle

Over the last couple of years, IBM has been at the forefront of innovating in the space of AI Governance. IBM Cloud Pak for Data provides a full stack of components for every stage of the AI lifecycle. It comes with built-in governance tools like Watson Knowledge Catalog as well as purpose built AI model risk management tools like Watson OpenScale. IBM Research has been at the industry frontier through its work in AI fairness, explainability and standardized documentation of AI models. Through 2020, we will be shipping enhancements to the Cloud Pak for Data platform in order to further push the boundaries of governance of AI applications.

Register for the upcoming webinar featuring Forrester and IBM titled, AI Governance: Drive compliance, efficiency and outcomes

Accelerate your journey to AI.