3 principles for climbing the AI ladder with IBM Governed Data Lake

Director for Watson and AI applications, IBM

Recently, we capped off the first leg of the “Enabling digital business with an IBM governed data lake” road shows in the Asia Pacific region with our customers and partners.

Each session was jam packed with our clients and partners coming from various industries, from banking to telecommunications to airlines. Our clients discussed their biggest challenges in becoming cognitive and data driven businesses.

I was excited to hear about the value that IBM technologies are providing to customers in the region as they are leading the way with a digital transformation that’s truly changing the face of their businesses. One core attribute which almost all customers shared was how IBM introduces new innovative partnerships, products and methods to get them ready to compete in this new digital world. IBM Governed Data Lake (GDL) is testament to that innovation in the Asia Pacific region. IBM GDL helps customers build a strong ladder to benefit from artificial intelligence technologies.

We discussed wide variety of concerns in the event including:

  • How businesses can make data their competitive advantage and bring in new revenue streams by applying data science and machine learning
  • How business are seeing artificial intelligence as the ultimate answer to becoming digital businesses
  • Why an IBM governed data lake is a first basic and important step to bring AI to enterprise to create data-driven business

Here are three common patterns from these conversations which I am going to try to demystify.:

1. A strong data foundation is the first step to realizing the full power of AI for digital business.

Artificial intelligence and machine learning best work when you can access all your data regardless of its location, whether it is in traditional RDBMS, Hadoop Stores or in NoSQL databases.

You need a robust, flexible data repository that can ingest and persist massive volumes of data and data types, and navigate the challenges of seamless accessibility. Move above the constraints of space and time to ensure sub-millisecond performance to any user anywhere. A strong data foundation is key to bringing srtificial intelligence to your business to make it more efficient and effective.

A well-engineered, flexible and fit-to-purpose data repository is important. Data accessibility using a common federation engine for the data, which cannot be moved due to various reasons, is another key aspect. A combination of IBM Big SQL and 100 percent pure open source Hortonworks Hadoop delivers on the robust data platform for doing AI and ML on your data.

2. Robust governance is key to enable self-service analytics for digital business.

Without proper governance, businesses cannot trust data to make reliable decisions and drive good insights. Data governance should encompass strategies across data security, metadata management and data lineage at the minimum.

IBM Information Governance Catalog (IGC) enables business leaders to understand big data through its meaning, specification, structure and quality assessments. They can define rules to enable self-service analytics to directly impact business results. They can also cleanse and profile data with robust data quality engine using IBM BigQuality and integrate big data natively by running on open source YARN engine using IBM BIgIntegrate.

3. Businesses can discover unknown patterns by applying machine learning and data science.

A well-architected data lake should provide an environment to build data science models using open source languages such as R, Python or Scala. Strong integration with open repositories such as Github is must to recommend you the best algorithm for your use cases.

IBM Data Science Experience is tool designed for this hybrid data management world with power to bring data science to any business. It bridges the skill gap in the data science space.

A real-world example of a successful data science applied on vast amount of data contained in a governed data lake is automatic proactive fraud detection from banks and credit card companies. To detect suspicious activity on customer cards, a rule-based method is not enough. New patterns of fraud are always emerging to. Thus, the credit card company develops machine learning models and an algorithm based on customer demographic data, ATM network data, shopping patterns, behavioural data, and so on, which are put in a governed data lake.

That is just one of a number of the powerful data science applications possible with an IBM governed data lake. Other use cases of data science include disease detection, insider trading, social network analysis, customer acquisition or churn. IBM is helping its customers use data science to discover consumer preferences, classify different consumers based on their purchasing activity, and determine what makes for high-paying customers. This analysis can turn out to generate new revenue streams for businesses.

The future of business is algorithmic with IBM Governed Data Lake. We are just at the beginning of the artificial intelligence and machine learning revolution, and IBM governed data lakes provide a ladder to reach that goal.

Across the Asia Pacific region, we are conducting a series of joint governed data lake roadshows. Tweet or reach out to me at Linkedin to join these sessions and be part of this exciting journey. Explore and learn more about IBM Governed Data Lake here.