Why data science at banks is missing the mark, and how to fix it
The business that gets there first won’t necessarily win digital and AI game. It will be the one that ingrains digital and AI in its business as much as possible. Starting from applying intelligent data science where it matters most and progressively using it in every aspect of the business.
In the modern banking environment, consumers are well informed. They expect intuitive, engaging and informative experiences when they bank. Banks need solutions that can help them delight their customers with personalized experiences, empower their workforce to provide differentiated experiences, optimize risk-taking capabilities with AI-enabled insights and transform products and services with data at the core. Applied data science and cloud-native business architecture are both critical for digital transformation for banks.
At the center of this transformation is the data scientist and the supporting team of data engineers, data stewards and, depending on the size of the organization, spearheading all these data personas is the chief data officer. The biggest challenge data scientists and teams face is delivering business results and taking insights, models and intelligent applications to production to create business value and show organizations concrete business impact.
Consider the below excerpts from data scientist job descriptions that came from brainstorming with my two of biggest banking customers in Singapore and Australia, respectively:
“You will be part of a very vibrant and dynamic team at the heart of the new Digital Bank that is innovating the way the customers engage with bank through the most customized experiences possible. We will be expecting you to have extensive experiences in data science and analytics field - developing models, rules and algorithms from structured and unstructured sources and performing deep-dive analysis to derive data-driven decisions.”
“We have a data scientist role available within our Data & Analytics Tribe, where their mission is “to lift the productivity and effectiveness of our Tribes (and beyond) via delivery of high-quality analytical solutions, data assets, tools and insights.” Our data scientists design next-generation data stores and analytic platforms, and they build advanced analytical models to solve complex problems and generate sophisticated insights.”
Clearly there is no single best way to describe the qualities required to perform well in the heightened expectations businesses have from data professionals.
Great AI needs great data
A pro data scientist will be quick to realize the expectations will use great data to realize the value of AI. It’s what I call “great AI needs great data.” One has to have a holistic, integrated view of the business and blend of technical skills. My yardstick for a pro data scientist is an expert who maintains their own best practices and is fully capable of executing a data science process of arbitrary complexity from business ideation to deployment. The pro data scientist has strong tool and language preference, can find or craft solutions using available open-source or proprietary libraries, and is willing to use a large variety of technologies to address a specific need, even if they fall outside their preference.
To be successful, the pro data scientist needs tools that help them:
- Partner with business stakeholders to understand needs and identify opportunities to apply data science by framing the opportunity as a data science problem, formulating hypotheses and choosing techniques for experimentation
- View and understand business data in context, while at the same time handling data scale and complexity problems; also identifying and creating business features for analysis, particularly a target variable
- Manage and switch between compute environments to scale out and scale up compute with appropriate computation assets such as GPUs
- Ease generation and execution of many variations to problems by managing and understanding metrics through running multiple experiments and creating multiple models and other artifacts.
- Deploy solutions into production, explaining results along with feature and data lineage
Combine data and AI on cloud-native architecture
I propose a shift in thinking. Let’s move away from providing a set of varied tools and technologies to data scientists or data engineers with the intention of point improvements to their capabilities. Let’s move to a holistic solution that enables data scientists to adopt technologies as they become available, gathering incremental value to their processes and workflows while aligning them to business needs and modern technologies.
Enterprise data can be siloed across hundreds of systems such as data warehouses, data lakes, databases and file systems that are not AI-enabled. This means an enormous amount of time is spent combining, cleaning, verifying and enriching the data to get it ready for the model.
AI frameworks such as TensorFlow, PyTorch, and SciKit-Learn don’t do data processing. They assume that datasets are clean and have pre-built data infrastructure to do the data processing. These technology silos make it very hard for enterprises to succeed in AI without an army of highly sophisticated engineers and data scientists. The ability to use cloud-native architecture with capabilities to scale up and down in the ecosystems of containers and microservices to deploy machine learning applications is paramount.
IBM Cloud Private for Data (ICP for Data) is the first integrated analytics platform to bring data and AI technologies together in a containerized platform. It could become the de facto data processing and AI engine in enterprises today because of its speed, ease of use and sophisticated analytics.
IBM is running a successful pilot in one of the biggest banks in Asia. We’re helping it offer millennials greater access to credit by developing innovative machine learning models that touch all aspects of its lending business. These models process millions of items of semi-structured data along with structured data, then apply a sophisticated data science model to do predictions.
Before, the teams had a poor integration of data management tools with the data science framework. Models took multiple days to deploy due delayed updates from the data team to the useable data. With ICP for Data, the data science team has seen significant productivity improvement and five times faster time to value for advance AI products.
AI and cloud ready data with IBM Cloud Private for Data
ICP for Data simplifies data preparation for AI by unifying data at massive scale across various sources: cloud storage systems, distributed file systems, key-value stores and data warehouses using data virtualization technology. It also supports various popular AI and machine learning frameworks and libraries such as TensorFlow, PyTorch, Spark and R. ICP for Data helps train and evaluate your machine learning and AI models and operationalize data science and AI models at scale without limits.