Extracting the value of location and time data just got simpler with IBM Cloud Pak for Data

Portfolio Lead, Watson Studio Data and AI, Cloud, IBM

Proper use of time series and location data in prediction and optimization can considerably boost the yield of data science and AI initiatives. While location and time data have been available for business use, using them in AI requires scaling them as spatiotemporal functions that can be processed with high performance. This has been a major industry challenge, due to the fact that key geospatial functions are locked away in database silos or fragmented everywhere. Furthermore, a time series of locational data, (which is a series of data points indexed in chronological order), has been even harder to incorporate

Just as we are automating AI lifecycle management with AutoAI and promoting AI model trust and  transparency with Watson OpenScale, IBM Research has been tasked to solve this additional, demanding challenge. Spatiotemporal functions implemented as part of Analytic Engines in Watson Studio are now coming to Cloud Pak for Data. This is yet another example of being “Born in IBM Research” and solving a tough, real-world problem for enterprises. To share what was involved in the development and commercialization of spatiotemporal functions, I sat down with Raghu Ganti, a Research Staff Member and Master Inventor at IBM's Thomas J. Watson Research Center, and a member of the IBM Cloud Spatial Team.

Why did the IBM Research team focus on spatiotemporal functions?

The rise of location sensors has been undeniable in various devices, such as smartphones, connected cars, and IoT sensor feeds. Fusing geographic feature data (for example, zip code polygons or address features) with these location sensor data is key to extracting and enriching the data with contextual information. Being able to add a location context to enterprise problems provides valuable insights and integrating it into the AI pipeline creates exciting opportunities.

Why have these capabilities become more important?

Spatiotemporal functionality has become increasingly important in the last ten years due to multiple technologies converging; the key enabling factors are (a) the availability of fine-grained location data due to more accurate GPS devices, WiFi enabled SDKs, and smartphones), (b) widespread location sharing due to social media and other apps), and (c) democratization of big data platforms through data science front-end tools such as Watson Studio. These location signals provide rich insights about an entity and their behavioral patterns. For example, is this entity performing an action such as a financial transaction from an unusual location? The industry also matured to a point where using this context-rich information across clouds has gotten so much easier.

I hear that this development effort started in IBM Research, tell me more. How did you get started?

Nearly ten years ago, enterprise-grade spatial functions were locked away mostly in databases such as DB2, Microsoft SQL Server, Oracle, and ESRI ArcGIS server. With the advent of big spatial data generated by moving objects (think smartphones and cars), IBM Research envisioned the need to empower data scientists with geospatial intelligence, thereby making it readily available for consumption in business processes. Our research identified the key technical gaps and created a lean embeddable library that integrates with a wide variety of big data platforms and enables large scale spatiotemporal functions to be performed on data from anywhere in the world.

What were the industry gaps you were trying fill?

Geospatial intelligence has been traditionally a subset of databases (such as DB2, Microsoft SQL Server, Oracle) and software verticals (such as ESRI ArcGIS server, Google Earth, Google Maps). There was a lack of uniform developer experience for geospatial analytics on public and private cloud. This essentially requires a cloud-native infusion of geospatial intelligence into the stack, starting from object stores to query engines and analytics/machine learning platforms.

Another key technical barrier in the adoption of geospatial analytics by citizen data scientists has been the need to juggle between over 8,000 known planar projections, where points in 3D space are mapped to a 2D projection plane. Our solution eliminates the need for projections, allowing accurate and performant geospatial operations on worldwide data (such as near poles, near anti-meridian, and large geometries). Further, this solution supports efficient large-scale spatiotemporal joins across private and public data sets.

What was the process like for taking these capabilities from research to commercialization?

We have helped implement these capabilities in more than 15 IBM Data and AI solutions, so the process has become much more streamlined. However, it has been a journey to establish the various development operations (DevOps) processes even before DevOps was a "thing.” The library needed to be available in a continuous delivery mode, and updates and bug fixes were being done in an agile mode since the very beginning of its availability (nearly 10 years ago).

A typical sequence of steps for enabling this feature in a Data and AI solution has been (a.) jointly identify the need for this functionality working with IBM offering management, (b.) integrate with the product by working with a development team, (c.) educate the product team and support initial client proof of concepts (PoC), and (d.) maintain a continuous innovation pipeline for new features and emerging applications. This approach allows us to have an end-to-end viewpoint and 360-degree view of the technical issues, identify the gaps to be filled for a specific product or a client need, and manage business adoption and requirements.

For what use cases do clients use these functions?

There are several use cases, ranging across automotive, shipping, insurance, event management, and environmental protection sectors. For example, the geospatial library enables the monitoring of ship movement near a port to identify illegal activities based on anomalous pattern extraction. Another example is in the automotive industry, where real-time alerts are sent to vehicles based on their context, such as geo-fencing alerts, path-based alerts, and nearby vehicle behavior-based alerts.

Finally, it’s heartwarming that AI models can now extract patterns indicating potential wildlife poaching. They send real-time alerts to rangers notifying them of poachers within the large wildlife parks that protect endangered animals such as rhinos.

With these capabilities coming to IBM Cloud Pak for Data, what is possible?

Data scientists and developers can now use IBM’s flagship data and AI platform to tap into the power these innovations in spatiotemporal functions, both for basic geospatial data and to learn and extract patterns and models from complex moving objects. What’s more, organizations can implement these spatiotemporal functions as part of modern apps in a containerized, cloud-native environment. With spatiotemporal data capabilities in IBM Cloud Pak for Data, businesses can put this high value, context-rich intelligence to work at scale.

Spatiotemporal functions will become available in Cloud Pak for Data later in Q419. Learn more about IBM Cloud Pak for Data and try it for free here. Or you can explore the potential of AI combined with human intelligence with on-demand Webinar on the Future of Work. You can also join us in a live webinar as we kick off our 3-part webinar series on building a Winning with AI playbook.

You can also take a business value assessment to see the ROI of investing in AI today. And download free key learnings from the IBM Data Science Elite Team on implementing AI successfully here.