Use deep learning to filter big data for the otherwise unknowable

Post Comment
Big Data Evangelist, IBM

Ah, yes, grasshopper, true knowledge comes when you realize how blind you've been.

Deeply grasping any knowledge domain is sort of like a zen state of enlightenment. The more deeply you possess it, the more completely you are aware of the limits of your understanding. As the philosophers have said since the dawn of time, great wisdom develops hand-in-hand with great humility. While others might stand in awe of your seeming genius, all you sense is how little you truly know.

The hallmark of the true expert is a growing appreciation for all of the unknowns (the known and unknown ones alike) that hedge your understanding from all angles. If this seems to echo a famous statement by then-US Department of Defense (DoD) Secretary Donald Rumsfeld, you aren't imagining it. But the insight didn't begin with Rummy.

Interestingly, prominent data scientist Kirk Borne reports having a similar epiphany in the late 90s, and bringing it to the GWBush administration's attention when he was consulted in the immediate 9/11 aftermath. As Borne states in this recent interview regarding the unknown unknowns, "the biggest potential for discovery by far are the things that you never expected to find in the data."

Black hole.png

Image courtesy of Wikimedia Commons and used with permission.

A data scientist uses machine learning (ML) to find heretofore unknown correlations and other patterns in fresh data. ML is adept at finding both the known unknowns and the unknown unknowns through the power of supervised learning and unsupervised learning methodologies, respectively. The key distinction is presence or absence of an a priori, or known, pattern in the data being analyzed. Supervised learning involves data scientists feeding an a priori training data set that instantiates the known pattern that ML algorithms will attempt to find in the data set being analyzed. Unsupervised learning involves algorithms searching for patterns in the data without the benefit of a known (without a priori training) data set.

But in many problem domains the boundary between known and unknown patterns isn't as clear-cut as it may seem at first glance. In some problem domains, such as sentiment analysis in natural language processing, the known patterns may be convoluted in nature, involving hierarchical and non-linear relationships among overlapping levels of language representation (such as syntax, grammar, semantics, situation, intention and conversational context).

When the known patterns of NLP (as defined by computational linguists and other experts) interact in so many unknown ways, highly specialized ML algorithms can automate the process of discovering the interactions. These algorithms and approaches often fall under the heading of deep learning.

As Wikipedia describes it, "deep learning algorithms [such as neural networks] are based on distributed representations....The underlying assumption behind distributed representations is that observed data is generated by the interactions of many different factors on different levels. Deep learning adds the assumption that these factors are organized into multiple levels, corresponding to different levels of abstraction or composition."

The power of deep-learning algorithms is they can use NLP to speed up the discovery of hierarchical or non-linear data patterns that are not just heretofore unknown but would be, without high-performance computing (HPC) and big data at our disposal, "practically unknowable."

By the latter, I'm referring to the fact that computational linguists are only human, and that, even if there were millions of them reviewing every single scrap of a massive text corpus all day every day, they would probably overlook much intelligence that is either buried deep in textual nuance or that is scattered as clues across myriad documents.

That objective (to use deep learning to wrestle the practically-unknowables down to knowables) seems to be the impetus behind a two-year-old US DoD Advanced Research Projects Agency initiative called Deep Exploration and Filtering of Text (DEFT).

As described in this recent article, DEFT aims to "analyze textual data at a scale beyond what humans could do by themselves....[DEFT is designed to enable] more efficient…processing [of] text information and…[greater] understanding [of] connections in text that might not be readily apparent to humans....[D]efense analysts [would be able] to efficiently investigate…more documents, which would enable discovery of implicitly expressed, actionable information within those documents."

DARPA's ability to deliver on this grand promise is still unproven. However, the range of deep-learning ML approaches included under DEFT is truly impressive. A partial list includes separate functional modules to detect anomalies, disfluency, ambiguity, vagueness, causal relations, person-relations, semantic equivalences, entailments and redundancies in textual corpora.

What's fundamentally unknowable is whether the deep patterns being sought (such as what's in people's innermost thoughts) can be gleaned from text alone. That's essentially what the DEFT initiative is aiming for: using multilayered ML to get inside the heads of of terrorists and others who might pose a threat to national security. NLP alone is probably insufficient, because so much of what's on people's minds never gets articulated in written form. In addition, you would need video, speech, gesture, image and geospatial analytics to round out a 720-degree view of people and groups in their natural element.

Another unknowable is exactly how much deep-learning horsepower would be necessary to deliver "five 9s" confidence in this 720-degree portrait of the human heart. When what you're looking to predict and prevent is the behavior of specific individuals on specific occasions, rather than of broad cohorts in general scenarios, you are venturing into big-data terra incognita. Even if we disregard the privacy implications of that ambition and the fact that today's deep-learning state of the art is nowhere near that precise, you would probably need throw more HPC and big data at the problem than exists on planet Earth right now.

If you're passionate about technology, true wisdom means understanding the limits of what it can do, even as its powers expand astronomically.