By 2025, there will be 180 trillion gigabytes of data in the world, compared to only 10 trillion gigabytes in 2015. Of this, 90 percent will be unstructured, which is why many organizations are adopting open source data lake technologies such as Apache Hadoop to handle this expanding volume and
On the second episode of Data Decoded, Seth Dobrin, VP & CDO of IBM Analytics discusses his role as a Chief Data Officer at IBM and the latest IBM Analytics announcements from Think 2018, from IBM Cloud Private for Data to launch of the Data Science Elite Team.
The data lake may be all about Apache Hadoop, but integrating operational data can be a challenge. Learn how to deliver real-time feeds of transactional data from mainframes and distributed environments directly into Hadoop clusters and make constantly changing data more available.
Managing enterprise information has always been a good idea, however with the potential for looming penalties from the General Data Protection Regulation (GDPR) non-compliance, companies are waking up and some organizations are even seeing GDPR as an opportunity to establish strengthened
Although there are many new and emerging classes of data integration, quality and governance software tools available in the market, many large organizations are coming to the conclusion that they're best served by a single unified enterprise data integration, quality and governance platform that
In the connected world of today’s digital economy, apps, IoT devices, vehicles, appliances and servers are generating endless stream of event data. The stream of events describes what is happening over time and offers the opportunity to track and analyze things as they happen.
Recently, I had the honor of speaking with a number of the world’s most influential thought-leaders in the fields of data science, data analytics, machine learning and digital transformation. This group of prominent data technologists was more than happy to answer a wide variety of question on
In any successful modern organization, analytics is likely to play a central role in helping decision-makers design and execute effective business strategies. At IBM, as we work with clients across the globe, we’re seeing ever-increasing levels of maturity and confidence in data-driven business
The data lake can be considered the consolidation point for all of the data which is of value for use across different aspects of the enterprise. There is a significant range of the different types of potential data repositories that are likely to be part of a typical data lake.
Dwaine Snow is a Global Big Data and Data Science Technical Sales Manager at IBM. He has worked for IBM for more than 20 years, focusing on relational databases, data warehousing, and the new world of big data analytics. He has written eight books and numerous articles on database management, and
For today’s data scientists and data engineers, the data lake is a concept that is both intriguing and often misunderstood. While there are many good resources about data lakes on ibm.com and other websites, there is also a lot of hype and spin. As a result, it can be difficult to get a clear
Building a data lake is one of the stepping stones towards data monetization use cases and many other advance revenue generating and competitive edge use cases. What are the building blocks of a “cognitive trusted data lake” enabled by machine learning and data science?
In many cases the data lake can be defined as a super set of repositories of data that includes the traditional data warehouse, complete with traditional relational technology. One significant example of the different components in this broader data lake, is in terms of different approaches to the