Blogs

Post a Comment

Starting Your Big Data Lab for a POC

September 16, 2013

In continuation of my previous blog post, “6 Steps to Start Your Big Data Journey,” I want to address here the question “How should you start your big data journey?”

What is the Big Data Lab?

The Big Data Lab is a dedicated development environment, within your current technology infrastructure, that can be created explicitly for experimentation with emerging technologies and approaches to big data and analytics.

Key Activities within the Big Data Lab:

  • Assemble a selected set of technologies to be evaluated during your 2-3 months
  • Test permutations against high value use cases
  • Develop recommendations from the testing scenarios to drive future architecture and usage

What Should be the Big Data Lab’s Objectives?

  • Deliver 2-3 “Quick Wins” to demonstrate the value of these technologies from both an IT and business perspective
  • Create a “Proof-of-Concept” that shows how these technologies can be integrated into your enterprise existing architecture
  • Develop future-state AI architecture recommendations
  • Deliver low-cost, high-performance, agile BI and data discovery, with a focus on big data technologies
  • Pilot new analytical capabilities and use cases to prove business value and inform a long-term roadmap to compete on analytics
  • Establish a permanent “Innovation Hub” within your architecture and center for big data and analytics skill-building

What Components Should You Consider in Your Big Data Lab?

  Lab Components

  Function

Big Data Storage and Processing

  • Use Hadoop and big data tools as a pre-processing platform for structured and unstructured data before loading into an EDW
  • Use Hadoop platform for storing and analyzing unstructured and high-volume data

Real-Time Ingestion

  • Use real-time data ingestion into Hadoop
  • Filter data in real-time during collection; ETL high-level data for real-time analysis

Data Virtualization and Federation

  • Enable near-real-time reporting through the ODS and self-service visualization tools

BI, Reporting and Visualization

  • Enable structured reporting to enable business intelligence reporting and self-serve capability
  • Employ visualization tools to make insights operational

Analytics

  • Use predictive analytics and scenario modeling capabilities to improve audience measurement and campaign management

ETL / ELT – Data Integration

  • Develop custom ETL and data modeling to aggregate multiple data in high-volume and disparate formats

Data Discovery and Exploration

  • Build a discovery environment that allows for the combination of enterprise data with external data sets

Data Governance

  • Establish a data governance and change management model to ensure that analytics are embraced across the organization

What’s the Proposed Hadoop Infrastructure for the Big Data Lab:

Pramanick-Lab1_1.png

 

What’s a Sample Use Case for Your Big Data Lab:

 

Pramanick-Lab2.PNG

Pramanick-Lab3.png

The Big Data Lab’s research mission is to identify, engineer and evaluate innovative technologies that address current and future data-intensive challenges. In order to unlock the potential of big data, you need to overcome a significant number of research challenges including: managing diverse sources of unstructured data with no common schema, removing the complexity of writing auto-scaling algorithms, real-time analytics, suitable visualization techniques for petabyte-scale data sets, etc. The Big Data Lab will provide you with the platform to test your hypothesis and integrate your big data efforts across your organization.