10 can’t miss posts for big data developers: November 2013

Post Comment

Quick—name one difference between data and big data. It’s bigger, of course, and it comes in faster, in more shapes and sizes. Sometimes it’s hard to sort the trustworthy from the noise. For a big data developer, the trick is to find tools and technologies that can be applied to traditional methods of processing and analyzing data. Check out the "SQL to Hadoop and back again" series for tips. See how Apache Oozie, Apache Avro and the Weka classification system can simplify the job, explore the new StreamsDev site for developer-centric hints and pointers on stream computing, and use the comment section to share your topic suggestions. Enjoy these highlights!

  1. SQL to Hadoop and back again, part two: Leverage HBase and HiveFind out how to use HBase and Hive to exchange data with your SQL data stores. Although they look the same from the outside, these systems have very different goals.
  2. SQL to Hadoop and back again, part three: Direct transfer and live data exchange: Learn what makes Sqoop an efficient method of swapping data, enabling live transfer of data between your SQL and Hadoop environments.
  3. Oozie workflow scheduler for Hadoop: Big data in its raw form rarely satisfies the Hadoop developer's data requirements for performing data processing tasks. Let Apache Oozie help automate the process of preprocessing data using different types of workflow, which can be reused.
  4. Analyze Hadoop logs with IBM Accelerator for Machine Data Analytics: Learn how to collect, integrate and analyze Hadoop logs produced by InfoSphere BigInsights with the help of a new log monitoring and analysis function that aggregates log files and stores them over time.
  5. Create a simple predictive analytics classification model in Java with Weka: Real-time classification of data (the goal of predictive analytics) relies on insight and intelligence based on historical data patterns. Learn how to use the Weka classification engine to create a simple classifier for programmatic use.
  6. Explore StreamsDev, your direct channel to the Streams development team: Find all the resources you need to develop with InfoSphere Streams, brought to you by the extended Streams development team. Doc, product downloads, SPL code examples, help, events, expert blogs— it's all there. Plus, a direct line to the developers. Get started analyzing data in real-time now.
  7. Real-time anomaly detection using the InfoSphere Streams TimeSeries Toolkit: Automate the detection of anomalies in time series data to monitor systems across the domains of cybersecurity, infrastructure, data center management, healthcare and the environment.
  8. Build a sentiment analysis application with Node.js, Express, sentiment and ntwitter: Use Node.js modules to build an app that analyzes public reaction on Twitter. The sample application makes use of popular Node.js modules and builds a structure that can be reused for future applications that need to be created quickly, using a mobile interface, to analyze large volumes of data.
  9. Big data serialization using Apache Avro with Hadoop: Share serialized data among applications with Apache Avro, a framework that produces data in a compact, binary format that doesn't require proxy objects or code generation.
  10. Deliver association modeling data mining recommendations with IBM Cognos: Integrate the output and recommendations of an association data mining model into an IBM Cognos reporting environment for easy access by business processes and people.