The Ground Truth in Agile Machine Learning

Distilling knowledge effortlessly from big data calls for collaborative human and algorithm engagement

Big Data Evangelist, IBM

Machine learning has a critical dependency on learned humans. Without a baseline set of training data labeled by one or more human experts, many machine-learning algorithms can’t get off square one. They search for data patterns that are consistent with those previously tagged and flagged by a human in the know. This description is a well-established machine-learning approach called supervised learning.


Maintain a footing in ground truth

One of the interesting terms of art in this field is the concept of ground truth, as discussed in the article, “From Artificial and Computational Intelligence to Machine Learning.”1 The author discusses ground truth as a golden standard to which the learning algorithm needs to adapt. It involves a tutor that tells the student—that is, the machine-learning algorithm—what to learn. According to the author, normally the tutor is a human expert who labels the data examples to be categorized by the adaptive classifier, which is that same algorithm. By contrast, the two other main approaches to machine learning—unsupervised learning and reinforcement learning—eschew the notion of ground truth and attempt to automate the distillation of knowledge from data that no human tutor has pre-tagged.

In some broader sense, the epistemological notion of ground truth could apply to any machine-learning approach, if taken to mean the prior understanding of what sorts of patterns the algorithm is trained to search for. The truths being distilled from the data are those consistent with what domain experts—tutors—or quantitative experts—for example, statisticians, mathematicians, and so forth—consider meaningful.

For supervised learning, they are patterns flagged by human experts. With unsupervised learning, they are the patterns associated with clustering approaches—for example, k-means clustering, mixture models, hierarchical clustering, hidden Markov models, blind signal separation, self-organizing maps (SOMs), adaptive resonance theory (ART), and so on. And with reinforcement learning, they are patterns consistent with whatever algorithmic behavior maximizes some a priori criteria associated with some cumulative reward function.

Over time, I think that more machine-learning challenges are going to be addressed by blends of these learning approaches. The cited blog article refers to these approaches as “hybrid intelligent systems.” In many frontier areas of advanced signal intelligence and cognitive computing, domain experts may have little confidence in their own a priori understanding of the key variables and relationships to search for. In those cases, they’ll defer to the mathematical ground truth patterns revealed through unsupervised and reinforcement learning models.


Engage in collaborative pattern identification

This concept relates to some thoughts I blogged about several months ago on the topic of next-best expert.2 In the era of big data and cloud computing, we need to stretch our notion of creativity to reflect the increasingly coequal and codependent collaboration of human experts and machine experts. The smartest expert may not always be a person. We need to cast our net broadly to include the expert contributions of unsupervised and reinforcement learning algorithms. They may be able to see emerging patterns in petabytes of data faster than a human expert.

The overarching ground truth is that no expert—human or machine—can identify all relevant patterns in all data at all times. To distill knowledge efficiently and continuously from the burgeoning big data universe, we need agile engagements within and among teams that may include shifting rosters of humans and machine-learning algorithms.

Please share any thoughts or questions in the comments.

1From Artificial and Computational Intelligence to Machine Learning,” by Aureli Soria Frisch, Neuroscience, tCS, and EEG blog, February 2013.
2Next-Best Expert: Collaboration of People and Machines on Big Data Analytics,” by James Kobielus, Big Data & Analytics Hub blog, January 2014.


[followbutton username='jameskobielus' count='false' lang='en' theme='light']
[followbutton username='IBMdatamag' count='false' lang='en' theme='light']