Using data across a hybrid environment to train machine learning models

Follow these code patterns and connect to on-premises and cloud data for initial model training and continuous learning

Product Marketing Manager, IBM Data & AI, IBM

Machine learning (ML) is rapidly helping businesses derive better insight and optimize their day-to-day operations. Yet an ML model is only as good as the data used to train and continually improve it. With the majority of enterprise companies already using a hybrid cloud, accessing domain-specific data you need can be challenging.

To help make accessing data across a hybrid environment a bit easier so that ML models can be trained and engage in continuous learning, we’ve developed two code patterns for on-premises and on-cloud connections.


Optimizing efficiency for the City of Chicago

These code patterns use Watson Studio Model Builder and SQL data in IBM Db2 cloud or on-premises offerings to gain better efficiency for clients like the City of Chicago. For large metropolitan areas like Chicago, conducting building inspections is vital for the safety and welfare of its citizens. However, it is also a very time-consuming process involving a large workforce due to the sheer number of inspections required. By leveraging the right data to train a model for building code violations, the City of Chicago can better predict which buildings have the highest likelihood of failing inspections—and therefore need prioritized attention. In this way, city inspectors can save time and resources while simultaneously improving their ability to address safety concerns faster.

Continuous learning using cloud data

Immediately after training a model, it seems fairly safe to assume that it can produce an accurate approximation of reality. However, the speed at which change occurs is increasing dramatically. What used to happen over the course of years is now happening in days, or in some cases, minutes. Models must be constantly updated to reflect these changes, which is why continuous learning is so important.

Continuous learning is a process where your machine learning models automatically improve over time as your training data evolves and grows, closing the feedback loop between training data and deployed models. After you configure your required triggers for retraining, new models with competing algorithms can be automatically created and trained, evaluated for performance, and conditionally deployed for immediate use by apps—without having to update them manually.

Take a look at the code pattern Continuous learning with WML and Db2 Warehouse on Cloud to try it out and experience the native ability of Watson Studio to use hybrid data management solutions like Db2 Warehouse on Cloud located within IBM Cloud for model training and continuous learning.

Accessing on-premises data while maintaining security

On-premises data storage, such as in a Db2 database, can give you full control over your data, including its security and integrity. However, maintaining a custom on-premises security posture presents a unique challenge when accessing data for model training or continuous learning.

To overcome this hurdle, Watson Studio uses the Secure Gateway Service to allow models to securely access and train models on your Db2 data sets. By co-locating the lightweight Secure Gateway client with your data, you can establish a secure, persistent connection between your environment and the cloud, gaining powerful options for implementing custom security policies on both ends of the connection. Using the code pattern Train a cloud-based machine learning model from Db2 on-premises data can provide a better idea of how this works in practice. 

The convergence of machine learning models and hybrid data management presents unique challenges. But using both is vital to developing the apps necessary for an insight-driven business. Take advantage of our code patterns to learn what’s possible when Watson Studio and Db2 are combined. And tune in to our developer webcast showcasing the advantages of using an AI-infused database for development.