Machine learning as a fluid intelligence harvesting service

Big Data Evangelist, IBM

Let's face it; development staffs are stretched thin enough already. Big data analytics budgets don’t have enough hours in the day or dollars to get every data-driven app developer and every data scientist up to speed on every machine learning algorithm, tool and technique. And even if they could all go back for intensive training on today’s latest practices in these areas, that knowledge would soon be behind the times when they received their certifications. 

Developers are only human. They have limited capabilities, attention spans and so on. But data and the knowledge that can be gained from it are seemingly unlimited. Even the world’s data scientists and domain experts have to prioritize their efforts to extract insights from some relevant portion of the vast ocean of information that surges around them. 

With only so many hours in the day, data scientists and analysts need to leverage every big data acceleration, automation and productivity tool in their arsenals to sift, sort, search, infer, predict and otherwise make sense of the data that’s out there. As a result, many of these professionals have embraced machine learning. 

Fundamentally, machine learning is a productivity tool for data scientists. As the heart of systems that can learn from data, machine learning allows data scientists to train a model on an example data set and then leverage algorithms that automatically generalize and learn both from that example and from fresh data feeds. 

Ideally, the latest and greatest machine learning algorithms would be services that you access as needed and on demand. Data scientists, and those developers who simply require the full algorithmic fruits of data science, usually prefer to execute rapidly on their projects and show results right away. If asked, they’d probably say they prefer to have all the principal machine-learning algorithms, models, execution platforms and application programming interfaces (APIs) available to them as needed, rather than having to assemble them from scratch and manage them in-house. 

Machine learning is the heart of cognitive computing as a service. As such, these services should be a central element of any full-featured platform-as-a-service (PaaS) solution in big data analytics. As noted in this recent InfoWorld article, “As the major cloud providers open up those capabilities to all developers, the stage is set for a new wave of applications that will be much more intelligent than before.” 

That article notes that IBM offers strong machine-learning-as-a-service (MLaaS). A separate InfoWorld article from late last year went into considerable detail on offerings in this area from IBM. In particular, it reported on the roll out of machine-learning-as-a-service APIs on the IBM Watson Developer Cloud platform, which is accessible through the IBM Bluemix development platform. Most but not all of the MLaaS offerings discussed in that article have natural language processing (NLP) at their core: language identification, machine translation, concept expansion, message resonance, question and answer, relationship extraction, user modeling and visualization rendering. 

Clearly, this solution set is only scratching the surface of cognitive computing. Expect to see further offerings from IBM that deepen the the MLaaS capabilities available to any developer through Bluemix. Competitors such as Amazon Web Services (AWS) and Microsoft have announced their own MLaaS offerings since that time, and obviously this arena of competitive differentiation is hugely important.

In addition, expect MLaaS to be deployed into every app development initiative requiring data scientific approaches. MLaaS can boost productivity by uncovering hidden patterns that even the best data scientists may have overlooked. These value points derive from machine learning’s core function: enabling analytics algorithms to learn from fresh feeds of data without constant human intervention and without explicit programming. In many ways, MLaaS can be the return-on-investment (ROI) capstone of big data initiatives because machine learning algorithms can grow to be highly effective at data scales in volume, velocity and variety. Without MLaaS capabilities that can dynamically respond to myriad concurrent data streams in the cloud, the human race risks drowning in its own big data.

Machine learning is a core element of Apache Spark. If you’re interested in getting deeper on Spark, join us at Spark Summit 2015 on Monday, June 15 at 7:00 a.m. through June 17 at 6:00 p.m. PT in San Francisco, California. Spark Summit 2015 brings the Apache Spark community together to hear from leading production users of Apache Spark, Apache SparkSQL, Apache Spark Streaming and related projects. In addition, find out where the project development is going, and learn how to use the Spark stack in a variety of applications. Register for Spark Summit 2015 today.