Do Not Ignore Structured Data in Big Data Analytics
The important role of structured data when gleaning information from big data
Much of the information that is created today is synthesized from unstructured data, which includes data from all flavors of new and dynamic sources such as tweets, social media posts, and mobile device–captured data. The challenge in creating information is to understand how to integrate all this unstructured data with traditional structured data that is commonly used in IBM® DB2® for z/OS® databases.
The big data revolution
Most of the business value and opportunities of big data likely reside in the ability to get information from huge amounts of data. I cannot stress enough the clear distinction that exists between data and information. Data is assembled into information only when organizations, decision makers, and business analysts can synthesize the data into meaningful information from which they can derive a sense of value that fits the context of their needs. And it appears that organizations, decision makers, and business analysts—just to name a few information consumers—can make a lot more sense of big data–derived information when it is organized in a structured fashion rather than an unstructured one.
To the extent of my understanding, we do need to rationalize the data that we have available before being able to get information and business value from it. Today’s technology is allowing organizations to manipulate huge amounts of data in a cost-effective and very fast way. This manipulation enables them to capitalize on new sources of information and collect data into a traditional structured repository.
Big data is made up of structured and unstructured data. Many information sources claim that 90 percent of the data is unstructured, while 10 percent of the data is structured. And typically, this 10 percent of the data is the one that matters more than the rest.
Don’t get me wrong. I am very enthusiastic about the recent big data revolution, and I am looking forward to exploring even more of the opportunities made possible by the technology today than were available previously. However, I do believe that traditional structured data has an important role to play.
Part of the DB2 evolution includes listening to market trends and improving the DB2 portfolio of features to cope with new challenges. In my opinion, DB2 is 30-year-young software that helps successfully manage the subtle balance between innovation, state-of-the-art technology, and enterprise-critical reliability.
One real-world big data challenge
As a case in point, I have been involved in a project that looks after consolidating newly generated social media data from very popular sources and merging this data into an existing structured contacts database. There are numerous challenges, but luckily many application programming interfaces (APIs) allow easy extraction of big data and in some cases even in almost real time. This easy extraction makes the collection of the data and its integration on existing extraction-transform-load (ETL) processes possible.
The challenge is to be able to glean information from this data by looking after correlations, patterns, and trends within the combined—old and new—data. The chosen repository is DB2 for z/OS. This option allows for leveraging the 30 years of evolution that DB2 offers in terms of performance, availability, and business value. It provides the basis for getting business insight from information. And it enables us to use everyday tools to exploit the in-house, hands-on experience of manipulating data in a traditional format.
A place for structured data
The revolution of today is big, unstructured data. But there is still a predominant place for structured data that we cannot ignore. DB2 for z/OS supports the data that matters in many of today’s organizations and continuously evolves to embrace the technology needed today and tomorrow.
What is the role that DB2 plays in your organization? And what about tomorrow? Share your answers in the comments.
|[followbutton username='IBMdatamag' count='false' lang='en' theme='light']|