Blogs

Spark: Fostering community innovation

Manager of Portfolio Strategy, IBM

As open source technology has become ubiquitous, open source software provides general access to product design and enables universal distribution through free licenses. The concept of open source became popular with the rise of the Internet in the 1990s and was driven by the need to massively retool computing to make it available for all.

It is easy to forget that the idea of community innovation has been around for over a century. For example, Henry Ford won a legal battle to openly share patents without exchanging money. By the time the US entered World War II, 92 Ford patents and 515 patents from other companies were being shared among manufacturers. More recently, the US Department of Health and Human Services launched the Genomic Data Sharing (GDS) project to facilitate the translation of research results into knowledge, products and procedures that improve human health.

IBM has a long history with open source involving decades of lineage and contributions:

  • Linux (#3 contributor), Derby, Geronimo and Jakarta
  • Eclipse, founded by IBM

The IBM software portfolio is built on open source:

  • IBM WebSphere: Apache
  • IBM Rational: Eclipse and Apache
  • IBM InfoSphere: Apache
  • IBM BigInsights for Hadoop and beyond

Today, IBM is fostering community innovation in Apache Spark. It is among the fastest-growing open source projects in history. Spark is built by a wide set of developers from over 200 companies. Since 2009, more than 800 developers have contributed to Spark, and the project's participants come from 16 organizations.

Significant resources in Silicon Valley, the Almaden Research Lab and China work on Hadoop-related engineering and contribute to open source. Today, 60 people in the Spark Technology Center work on Hadoop and Spark, about 200 on Open Stack and another 100 on Docker.

Their efforts are paying off. On 4 November, 2015, IBM System ML was officially accepted as an Apache Incubation Project. SystemML V0.8.0 is the first binary release of SystemML since its initial migration to GitHub on 16 August, 2015. This release represents 320+ patches from 14 contributors since that date. SystemML became publicly available through GitHub on 27 August, 2015. Extensive updates have been made to the project in several areas. These include application programming interfaces (APIs), data ingestion, optimizations, language and runtime operators, new algorithms, testing and online documentation.  

SystemML is a machine learning platform developed at the Almaden Research Lab—and it is designed to help simplify the process of expressing system learning algorithms more quickly.

Take your Spark journey to the next step. IBM invites you to a free 3-month trial of IBM Analytics for Apache Spark and IBM Cloudant. Use Spark in the cloud to conduct fast in-memory analytics on your Cloudant JSON data. Sign up today and also receive free SaaS Startup Advisory Services to help you accelerate your time to results.