Get a move on: Transfer data between BigInsights, dashDB and beyond

Big Data Architect, IBM

A variety of ways are available to process data and integrate it with other services. Nevertheless, several organizations are turning to IBM BigInsights on Cloud, an Apache Hadoop as a service on IBM’s SoftLayer global cloud infrastructure. I work on the front line with customers who use BigInsights on Cloud, and I frequently get asked two questions:  

  • How do I programmatically perform action X on my data on service Y? Specifically, clients asking this question want to know how to programmatically connect to HBase from Java, for example, and perform create, read, update and delete (CRUD) operations on their BigInsights on Cloud service?
  • How do I programmatically move data between service Y and service Z? Specifically, clients asking this question want to know how to programmatically move data between their BigInsights on Cloud service, for example, and their IBM dashDB service.

Answering the data management questions

These questions need to be addressed early and quickly in any project’s lifecycle. Often, they’re even identified and handled right during sprint zero, and when answering them, they create the basic skeleton or plumbing for projects. In this way, future sprints can truly add incremental value in an efficient way.

And, while figuring out how to perform various actions and to move data across services is tactically necessary, it’s also time-consuming. The process can take a few hours to a few days for exploring options, developing skeleton code and arriving at plausible answers. 

To try and make this initial setup fast and easy, we went through a recent open source exercise focusing on BigInsights, in which we cataloged our findings in a repository and provided more than 35—and growing—working examples of code. And they are available to you to use against your BigInsights cluster and other services that you wish to integrate with—which hopefully will save you some time on the front end.

Once these coding examples are initially referenced, you’ll need to provide the connection details of your BigInsights cluster and the details for the services you want to connect. Then, running a single command to see the example running against your own environment is possible. When all is said and done, you can achieve setting up the sample code and running it in under five minutes.

Moving data in a widespread use case

In our coding explorations, we found that some of the more popular use cases focused on moving data between BigInsights and the dashDB data warehouse. Several instances offer examples: 

  • Importing data from a dashDB database to the Hadoop Distributed File System (HDFS) using Apache Spark: Spark moves data from dashDB to BigInsights.
  • Exporting data to a dashDB database using Spark: Spark moves data from dashDB to BigInsights.
  • Importing data from a dashDB database using Big SQL: Big SQL moves the data from dashDB to BigInsights. 

The code is now available to make any of these functions easier than ever. Get started by visiting the example repository, or access additional coding samples. The best way to manage and transfer data is always a riddle. Together, we can uncover and share ways to empower open source software with enterprise-grade capabilities.