Machine-milled insights and the limits of algorithmic automation in cognitive computing

Post Comment
Big Data Evangelist, IBM

We all want the proverbial "big brain" to part the clouds and reveal the blinding truth. That’s not necessarily a religious statement—it’s also a way of characterizing an increasingly inflated popular perception of the promise of automated cognition.

Machine learning, the core component of the cognitive revolution, has begun to assume “big brain” proportions in the popular mind. Most of us regard machines as things that exist primarily to automate processes that had previously been 100 percent manual.

Machine learning certainly fits this perception, inasmuch as these algorithms can extract data-driven insights far faster and more comprehensively than any human using mostly manual methods. If you’re fond of futurism, you might jump to the conclusion that machine learning could someday automate all data-driven analysis. Indeed, many people are starting to indiscriminately label every last advanced analytic approach as “machine learning,” per my recent discussion here, contributing to the false impression that it’s an unstoppable force bent on some sort of world domination.

Machine-milled insights and the limits of algorithmic automation in cognitive computingAdvances in automation have always stoked fears of mass unemployment. Some people these days are of the opinion that advances in machine learning may cause the need for business analysts and data scientists to wane. After all, why hire skilled people to do data sifting by hand when machine-learning algorithms can do it as well or better, generating never-ending streams of data-driven insights entirely on their own? And if you’re of an apocalyptic bent, it’s just a short conceptual step over the precipice that separates us all from the future of “Big Brother,” “The Matrix” or name-your-pet dystopia.

If that’s your worldview, please take a deep breath and relax. The robots are not taking over. Machine learning is not a sinister tool for enslaving the minds of the masses. In fact, it’s not even necessarily a tool for automating every last process associated with knowledge discovery and insight extraction. And it’s not even a machine, per se, so much as a framework for defining the statistical, mathematical and logical steps for identifying predictive variables, correlations and other patterns within deep data sets. Think of the fabled “Turing Machine,” which was simply a thought experiment for gauging the fuzzy boundary between human cognition and its algorithmic simulations.

Above all, machine learning is a tool for making humans more productive. That’s its role in Watson Analytics, one of the premier cognitive computing decision-support tool for human analysts of all stripes. And that’s the thrust of my argument in this recent post on "Cognitive computing and the indelible role of human judgement," in which I spell out the many ways in which machine learning will accelerate, not suppress or replace, human creativity, judgment and initiative in the cognitive computing era.

Even at its core, machine learning is not only and always about total automation. Indeed, the machine learning methods vary in degrees of automation. The core approaches are called supervised, semi-supervised and unsupervised learning, and these names allude to the degrees of automation and manual efforts entailed by each. As I discussed in this post, supervised learning involves data scientists feeding an a priori training data set that instantiates the "known" pattern that machine learning algorithms will attempt to find in the data set being analyzed. That’s what the astonishing performance of face, voice and other biometric applications of machine learning depends on: the ability to compare a new instance of each pattern to a training set of known faces, voices and so forth.

In contrast, unsupervised learning involves an algorithms search for patterns in the data without the benefit of a "known" (a priori training) data set. For example, machine learning models may sift through petabytes of security event logs looking for patterns consistent with cybercrime in progress, without any specific training data on all the myriad patterns such activity may take. Unsupervised learning is useful when the potential range of unknown patterns is so large that it’s best to automate the tedious process of flagging suspicious activity so that it may be escalated for further investigation using other techniques.

But even when it involves unsupervised methods, machine learning is never entirely automated. Data scientists must still prepare the data sets, specify the algorithms, execute them and interpret the results. The process of extracting insights and applying them within the context of particular data-driven applications is still inherently a creative, exploratory process that demands human judgment. Crowdsourcing the process of evaluating the results of unsupervised machine learning models, such as is often used in CAPTCHA tests, doesn’t change this fundamental imperative. Automating the execution of the algorithms themselves may be the least important aspect of the overall process.

This recent InfoWorld article on machine learning essentially makes that same point. In it, author Serdar Yegulalp quotes a machine learning expert who highlights the many human decision points that contribute to successful implementation of an unsupervised model. "One of the biggest truths of machine learning of any sort is that your model design, that is, the features you are extracting in order to feed the prediction engine, is of greater importance than the actual algorithms that are being used."

Clearly, machine learning won’t automate data scientists out of a job any time soon. Machine learning models need to be constantly crafted, refined and trained by human experts in order to extract insights reliably.

Of course, extraction of data-driven insights can be automated to the nth degree through machine learning. But the trick is selecting the right machine-learning algorithm, training it on the right data, and interpreting the results with the right amount of expert judgment.

Those are all very human processes, of course, performed by data scientists. These processes may be further automated through various tools, including machine learning. But there will always be an inextricable kernel of manual effort at their heart.

Data scientists are humans to the core, and many gladly offer their services for the larger causes of interest to all humanity. A great example of this is the recent Big Data for Social Good initiative. Watch this recent live chat with the teams and participants from challenge to learn why these passionate data scientists chose their projects, hear some of their challenges and delve deeper into the story behind their applications.