Spin up Hadoop and Spark clusters in minutes on a reliable foundation

Offering Manager, Watson Data & AI, IBM

We are excited to announce the general availability (GA) of IBM BigInsights on Cloud Basic plan. In the IBM Bluemix platform, you will find this listed as BigInsights Apache Hadoop Basic plan. Over the last three months, the service has been available as a public beta. It was encouraging to see the participation and feedback during the beta. The feedback has been valuable in improving the quality of the service over the last few months and we thank everyone who participated, used the service and provided feedback.

Ready integration in a Hadoop cluster

The Basic Plan under BigInsights on Cloud on Bluemix enables users to spin up and access Hadoop or Apache Spark clusters, offering a well-suited set of Hadoop ecosystem components within minutes. The service is built on the IBM Open Platform, an Open Data Platform initiative (ODPi)–certified Hadoop distribution that includes Apache Spark. These clusters expose Hadoop application programming interfaces (APIs) as public interfaces that allow ready integration of an external application with data and runtimes in a Hadoop cluster.

Moreover, integration with the Object Storage service allows data to persist independent of the Hadoop Distributed File System (HDFS) in the cluster, which enables clusters to be deleted without fear of data movement or loss. With this GA release, every cluster can be scaled up to 20 data nodes per cluster. Future releases are expected to gradually increase the number of data nodes to which each cluster can be scaled.

The service includes multiple IBM Open Platform (IOP) versions. The IOP Version 4.2 offers a stable set of Hadoop and Spark components that became available in 2016 and are being used by a number of customers. IOP Version 4.3 is a technical preview and early release version with Apache Spark 2.0. Spark 2.0 includes breaking new API changes compared to Version 1.6. By spinning up different clusters with different versions of Spark—or even other Hadoop components—users can test their applications using the newer version while running production code on the stable versions. We expect to introduce more updates to the technical preview version over the next few months, eventually leading to its GA release.

A big challenge that data engineers or analytics owners in lines of business face is reliably managing and scaling a Hadoop or Spark cluster. Managing clusters generally means relying on IT even when using a cloud-based service. The Basic Plan provides managed clusters with an automated healing capability for Hadoop or Spark components that are restored upon failure. Apart from this capability, the service also applies updates proactively, allowing users to focus on the data and analytics applications.

A vision for a reliable foundation

Our vision for the service as we head into 2017 is to be able to provide a reliable foundation for running analytics applications in combination with the IBM Watson Data Platform. A key focus item will be introducing Big SQL into this model. Additionally, the expectation is to focus on enhancing scale, reliability and integration with the Watson Data Platform tools such as IBM Data Science Experience (DSX), IBM Watson Analytics and more so that data scientists can spin up Hadoop and Spark processing engines on demand for running their models.

Now that you understand the value of this basic plan, you can learn more and try it starting on the BigInsights on Cloud web page. As always, we appreciate your feedback about the service. We’ll be happy to respond to your comments or questions here, on Stackoverflow or through Bluemix support.