Reality and misconceptions about big data analytics, data lakes and the future of AI

Product Marketing Manager for Data Lake & Cloudera Partnership, IBM

With the amount of choices surrounding big data analytics, data lakes and AI, it can sometimes be difficult to tell fact from fiction. With more than 40% of organizations expecting AI to be a “game changer,” it’s important to have a complete picture of the capabilities and opportunities available. We’ve explored three concepts to help you determine the best course of action for your AI implementation and dodge the myths that could lead you astray.

Data drives AI, promoting organizational innovation and success

Verdict: True

Increasing data volumes are helping drive the successful adoption of AI by providing a better foundation on which to build models for predictive and real-time analytics. According to Statista, the global big data market is forecasted to grow to 103 billion US dollars by 2027, more than double the market size in 2018. Organizations are seeing results with nearly 50 percent of respondents to a McKinsey Analytics survey (Analytics Comes of Age) stating that analytics and Big Data have fundamentally changed their business practices.

AI, machine learning and deep learning depend on both the quantity and the quality of data to generate models and automated rules for business analytics. Early AI adopters have found success in collecting and driving meaningful insights from big data and analytics when they focus on overcoming challenges such as:

  • No business case. Managing big data requires a long-term commitment and planning to accommodate future growth. An overall data strategy which includes a clear understanding and articulation of how the organization will be affected is key. Proper planning and communication will facilitate approvals and buy-in needed to move forward.  
  • Wrong technology choices. Selecting the wrong platform or tools can waste time and add significant cost and complexity to implementation and ongoing management. Enterprise-grade governance and security strategies are critical in avoiding legal and regulatory compliance issues when considering building new technologies.
  • Poor integration. Preparing for big data requires a plan for integrated data management across your organization. This would include data integration with external data sources and any existing data warehouse or data mart. 

AI will change business as we know it across industries

Verdict: True

We are moving toward increasingly automated decision making, creating a strong competitive advantage for organizations who leverage AI efficiently and effectively.  

Finance – Automated AI using advanced algorithms and models is driving better credit decisions with lower default rates. Factors such as location and buying habits can trigger security mechanisms for proactive fraud detection. AI is powering virtual banks through chatbots and self-help solutions, speeding time to value and reducing call center workloads. 

Healthcare – Aggregated patient data, advanced algorithms, data modeling and predictive analytics are elevating the precision of patient diagnosis and care. AI can increase testing accuracy, speed time in triage and help patients avoid invasive procedures. The biggest opportunity may be in giving doctors time back to spend with their patients.

Manufacturing – Automated collection of sensor data (IoT) facilitates real-time analysis machinery and operations, resulting in improvements in maintenance and better resource planning. By implementing AI, companies can eliminate unplanned downtime and better anticipate market change. Supply chain optimization is achieved through IoT sensors driving machine learning and through artificial neural networks. 

The data lake is not suitable for today’s AI

Verdict: Myth

At times the data lake has fallen out of favor with press and analysts who have reported that they are turning into data swamps. Yet, as the technology has matured and organizations gain more experience in how to use it successfully, issues identified early on have less relevance.  What early adopters failed to understand is that the data lake is not a data dumping ground. Data governance, while it can be delayed, is key in keeping the data clean and easily accessible. There is still the need for meta tags, cataloging and establishing user permissions at some point in the process. When built and managed correctly, the data lake provides the flexible infrastructure required for today’s big data and AI initiatives. 

Unlike the traditional data warehouse where data is transformed before ingestion, the data lake uses the ELT process (extract, load and then transform. Because data is ingested “as is,” a data lake enables the storing, query and analyzing of additional data types in a more cost-effective environment. The data lake has a key advantage over the data warehouse of accommodating new data formats of semi and unstructured data; streaming audio/video, call log, click stream, social media, and sentiment data.  A wider range of data is good for AI as models can be trained on more robust information, which will hopefully reduce bias and increase accuracy.

Cloudera and IBM are driving the future of AI

Verdict: True

The Cloudera Data Platform (CDP) was recently released and is one of the most comprehensive on-premises platforms; spanning ingest, processing, analysis,experimentation, and deployment. It combines the best of Cloudera Enterprise Data Hub and HDP Enterprise Plus, fusing the latest and greatest open-source data management and analytics technologies. CDP enables Edge-to-AI analytics, providing persona-driven experiences from data engineers to business users to data scientists, and paving the way to new private cloud deployment options. With CDP Data Center, you can enable multi-function analytics using an integrated suite of analytic engines spanning stream, batch data processing, data warehousing, operational database, and machine learning in support of a diverse set of operational and analytical use cases.

Cloudera Data Platform Data Center is now available through IBM. In addition to CDP we offer additional Cloudera products and an ecosystem of IBM solutions, products, multivendor support, and services. Learn more about IBM and Cloudera and how we are driving the future of AI by visiting the IBM and Cloudera partnership page. Or, to get an analyst’s perspective, read Cabot Partners’ recent report detailing the Total Value of Ownership for an IBM and Cloudera architecture.