Big Data Adventures

Avoid complexity when integrating big data into enterprise architectures

Product Marketing Manager, IBM

In the movie The Adventures of Tintin, directed by Steven Spielberg, the young journalist, Tintin, goes about solving the mystery of the sunken ship, Unicorn, and its treasure cargo. To solve the puzzle, Tintin needs to find the three scrolls that contain clues, but each one is hidden in three different Unicorn models. The three scrolls are not useful individually, but when they are put together they reveal the location of the sunken ship and its treasure. Toward the end of the movie, Tintin manages to get his hands on all three scrolls and locate a part of the lost treasure.

How does a movie plot based on a comic book relate to big data? Just like the individual scrolls in the movie, big data sources existing in silos have limited value to organizations. To get the most out of big data, data from different sources should be integrated and used within an enterprise architecture that includes data warehouses, data marts, analytics, and business intelligence (BI) systems. Many organizations need to overcome formidable technical problems before they can leverage big data and gain new insights. In many cases, these problems are fundamental—even traditional—information integration challenges.

Effectively integrating big data

The insight and information organizations can glean from analyzing big data often open fresh opportunities for enhancing business strategy or developing new products or services. However, The Adventures of Tintin movie analogy oversimplifies the challenge big data can be for many organizations.

Traditional methods of integrating data are no longer adequate because of the sheer volume and complexity of big data. This development creates a new set of requirements that impact key data warehousing and analytics initiatives. An enterprise-class data integration solution that comprises the following needs helps organizations to successfully handle big data:

  • Meeting current and future performance requirements: Performance is key because big data arrives at a high velocity. Data changes rapidly, and it needs to be fed to various applications in the system quickly so that business leaders can react to changing market conditions as soon as possible.
  • Scaling easily: Scalability is one of the most challenging big data integration requirements. When tackling big data integration, having a product that can achieve data scalability across all architectures with the same functionality is important for organizations.
  • Integrating with Apache Hadoop: A big data integration solution should marry the sophistication of existing enterprise architecture with the Hadoop framework to enable businesses to use the raw computing power of Hadoop for performance-intensive operations.
  • Supporting streaming data: Organizations must have the ability to quickly and easily integrate with systems that support streaming data for projects focused on real-time analytical processing.

As data volume, variety, and velocity grow, the time required for data integration activities increases dramatically, constraining IT from meeting service-level agreements (SLAs) and the needs of internal customers. Therefore, a critical requirement of big data integration is to elevate the productivity and efficiency of IT to manage big data as specified by the SLAs. Optimized solutions automate data integration and governance and employ it at the point of data creation to help boost end-user confidence in big data.

Efficiently leveraging big data

Recently announced IBM innovations make it easier than ever before for organizations to cross the chasm of hype around big data and effectively use big data for actionable insights. More importantly, these latest IBM innovations enable business users to get the information they need for their own analytical or operational projects.

  • Self-service integration: Recent enhancements to the IBM® InfoSphere® Data Click capability enable self-service access to a growing variety of data in traditional NoSQL and big data sources. Rather than building a queue of data requests to IT, business users can initiate data integration on their own, acquiring the data they need to move ahead with their projects.
  • NoSQL and big data integration expansion: Organizations implementing information integration projects can now efficiently leverage a comprehensive range of traditional and big data types, including data from JavaScript Object Notation (JSON), the IBM InfoSphere BigInsights™ analytics platform, and multiple Java Database Connectivity (JDBC)–accessible sources.
  • Automated analysis and validation of big data: Analysts, data scientists, and business users can now effectively manage big data by understanding the content and quality of the data sources. The Hive Object Database Connectivity (ODBC) driver provides native access to Hadoop-based file systems for analysis and ongoing data validation.
  • Real-time Hadoop updates: Keeping data in Hadoop repositories up-to-date with the latest changes in source applications is now automated, as changed data from multiple sources can be replicated directly to Hadoop repositories.
  • Hadoop distributions certification: InfoSphere information integration solutions are now certified with key Hadoop distributions, including Cloudera CDH 4.2, Hortonworks HDP 1.2, and InfoSphere BigInsights 2.0.

Seamlessly automating integration

Now that the hype around big data begins to settle down, many organizations clearly face several challenges in deriving value from new sources of data. To help maximize big data return on investment (ROI), organizations need to integrate big data sources into their existing information architecture and leverage the raw power of Hadoop for compute-intensive operations.

Big data requirements continue to evolve, and scalability of the big data integration platform is essential for meeting future needs. The InfoSphere portfolio helps deliver scalability and also offers the capability to bring together varied sources of data for trustworthy and actionable insights. It also offers the self-service data integration capabilities that go a long way in elevating the productivity of IT and business users and helps them meet their business objectives.

Recent IBM innovations help organizations ingest data from both conventional, internal sources and new, external sources in an agile manner. These innovations enable analysts, data scientists, and line-of-business users to effectively participate in big data initiatives and enhance confidence in data by reducing the complexity around new sources of data.

Please share any thoughts or questions in the comments.

[followbutton username='pshosangadi' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']