Machine learning molds the material world

Big Data Evangelist, IBM

Innovating is a tough, risky endeavor.

When fabricating brand new things in the real world, innovators often do so through steps. Typically, each step is progressively more detailed and more expensive than the one before. When what you’re attempting is something unprecedented, a stepwise approach helps to identify showstopper issues early on. In this way, the innovator can avoid spending money and time fleshing out unfeasible concepts.

Ideas are a dime a dozen. It can be hard to distinguish brilliant concepts from crackpot schemes unless they’re put to the test early on. That’s why engineers prefer to build models, prototypes and proofs of concept to see whether their bright ideas might actually work in practice.

However, moving from the conceptual stage to prototyping can get tricky as the number, range and complexity of ideas grow. If you’re doing R&D, you need heuristics to help narrow down the list of candidates to those that are most promising to test further in lab environments. More than that, you need analytics that can automate those idea-triage heuristics at scale. That’s because the candidate notions to test may be practically infinite. This is especially true in physics, chemistry, biology and the material sciences where alternative molecular configurations can quickly overwhelm the researcher through the sheer force of combinatorial explosion.

MachineLeraning_blog.jpgComputational modeling has revolutionized all branches of the physical sciences, engineering and design. Leading-edge work in these fields is pushing new computational frontiers at nano scales. Computation-centric methods allow researchers to model, simulate and assess a much wider array of options far more rapidly than old-fashioned physical techniques.

However, the incredible productivity of computational prototyping carries a downside: far more candidate molecules can be simulated than can reasonably be assessed by human researchers. The bottom line is that when you build bigger haystacks, you need more powerful tools for finding the golden needles that may be buried deep within.

This recent article highlights the conundrum: “Out of the approximately 10­­60 possible molecules that can be formed by linking tens or hundreds of atoms together, how do chemists identify which ones could be useful as new solar cell materials or pharmaceutical drugs?” The article cites a Harvard chemistry professor who uses computers to explore what he calls “chemical space.” This refers to the essentially endless array of molecules that can be created by linking atoms together into different shapes. The professor and his team are sorting through millions of “virtual molecules” in a quest to identify promising new materials for solar energy generation and storage.

Most interestingly, the article also highlights a powerful “needle-finding” solution. Through the IBM World Community Grid, the Harvard researchers are running machine-learning algorithms in the cloud to accelerate recognition of the most promising molecules. This is just the sort of challenge for which machine-learning algorithms are well-suited: pattern recognition against complex, multistructured, high-dimensional data sets. As the article notes, it often comes down to identifying a molecule’s shape, which often determines such properties as how it bonds and interacts with other molecules.

When what you’re attempting to manipulate are complex organic molecules, predicting their shapes, hence behavior, can be especially difficult. Half the battle is simply to visualize those shapes in real-world laboratory specimens. As this article discusses, researchers turn to advanced technologies such as high-resolution 3D confocal microscopy to capture digital images. They can leverage these images for computer-automated analyses of the human genome. In so doing, they can glimpse interactions among cell shapes, microtubule organizations and cell-cycle progressions one gene at a time. The value of this for fine-grained genetic engineering is obvious.

Though this latter article doesn’t mention predictive analysis or machine learning, it’s clear that data-driven algorithms could play an important role in automating identification of the most promising and probable interactions among genomic, proteomic and other organic molecules. A physics professor in the article states that they get a massive data set that consists of “100 slices and they’re all megapixel images.” Without fast algorithmic analysis it would be extraordinarily tedious to manipulate all of these complex molecular interactions and collate the data across them.

Clearly, it’s just a matter of time before humans can, from the vast palette of options, model viable new life forms in the computer before they actually materialize in a physical laboratory or fabrication facility. Just as important, we’ll be able to mold the material fabric of our own bodies, using data-driven computation to sift through diverse molecular options for micro-crafting replacement tissues, more effective pharmaceuticals and less invasive treatments.

Throughout the computational life sciences, organically perfect personalization will be the end goal. In the foreseeable future, machine-learning-driven algorithms sift through the dizzying complexities of our bodies’ molecular fingerprints. Once the scientists have identified those life-enhancement options that fit our unique physiologies to a T, they’ll be able to mold biomedical remedies, grafts, transplants and prosthetics that blend seamlessly into those that sprung from our own DNA.

If you think this scenario sounds far-fetched, you’re not following the news. Some may find it unsettling, shades of “Brave New World” and all that, but for those of us (self included) who will soon be entering senior-citizen territory and who don’t relish the inexorable decline of our physical persons, it’s actually kind of reassuring.