Big Data Analytics Helps Researchers Drill Deeper into Multiple Sclerosis

Big Data Evangelist, IBM

Multiple sclerosis (MS) is a chronic neurologic disorder that afflicts many in their primes of their lives. The biomedical research community has ramped up its use of big data analytics to illuminate the myriad factors that contribute to the onset and progression of MS.

On April 26, IBM announced that the State University of New York (SUNY) Buffalo is using tools from our Netezza portfolio and from our big data analytics business partner, Revolution Analytics, for their ongoing MS research initiative. We have recently published blogs on the effort, by Steve Hamm, Mike Kearney, and yours truly.

On May 10, IBM held #IBMDataChat on Twitter in which participants in this effort discussed how they're applying big data analytics to research, diagnosis, treatment, and  hopefully, someday soon – a cure for MS. It was a very lively and informative discussion. Here are the high points.

Shawn Dolley, IBM VP of Big Data Healthcare & Life Sciences, moderated the discussion. He also tweeted his insights on several key points, including the critical need for federation in biomedical research where scattered, disparate data sets must be combined and analyzed:

@shawndolley federated data a huge issue in healthcare, esp when multi-structured and so plentiful

@shawndolley My own experience with #pharma, hospitals, even payers, are trying to leverage multiple public & private data sets federated

One of our key MS experts on the call was Dr. Murali Ramanathan. He is SUNY Buffalo Professor of Pharmaceutical Sciences and Neurology, Director of Graduate Studies, and Co-Director of the Data Intensive Discovery Initiative. He discussed the big data analytics challenges that confront MS researchers. Most noteworthy is the challenge of analyzing myriad contributory factors to MS as well as the interactions among them:

@M_Ramanathan We are working on genome wide and next-gen sequencing data sets & larger set of enviro factors

@M_Ramanathan Risk factors are important for prevention & finding cause. Understanding progression important for stopping disease

@M_Ramanathan We are particularly interested mechanisms of progression of MS.Related directly to managing the disease.

@M_Ramanathan We don't know cause of MS. We want analysis to provide insight into how known risk factors work and to find cause

@M_Ramanathan Data from collaborative team the Baird MS Center, the Buffalo Neuroimaging Center and New York MS Consortium

@M_Ramanathan Many challenges in MS basic and clinic biomedical research are data intensive. Big data analytics is critical.

@M_Ramanathan The Interaction analysis we r doing is 1 of major computational challenges my group has faced.

@M_Ramanathan 256TB Appliance. few users, big compute + big data jobs. Interaction analysis creates data^n

@M_Ramanathan research is multi-disciplinary. We use computation and modeling extensively to complement clinical research

@M_Ramanathan We can look at a very large number of combinations. Data availability limits ability to many gene interactions

@M_Ramanathan We have been developing information theory methods and metrics for identification and search.

@M_Ramanathan Known environmental risk factors for MS include the virus tht causes mono, latitude/sun/vitamin D and smoking

@M_Ramanathan We want better methods to find interactions in large data. Challenge: number of possible combinations increases very quickly.

@M_Ramanathan No single genetic or environmental factor completely explains risk 4 developing MS. Interactions imprtnt

@M_Ramanathan My research focuses on the roles of interactions among genetic and environmental factors in #MS.

@M_Ramanathan understand gene-env interactions could help identify lifestyle, diet change to manage #MS better

Another key MS expert on the chat was Tim Coetzee, Ph.D., Chief Research Officer of the National Multiple Sclerosis Society Chief Research Officer. Like the other featured tweeters, he emphasized the need for data federation and compute-intensive interaction analysis. Coetzee noted how huge the historical research data set on MS already is and how fast it's growing:

@tim_coetzee federate: we take pharma, gov., & acad. dbs & bring them together in some way to learn about #MS

@tim_coetzee We have 25years of clinicaltrialdata on 1000s patients. We need #bigdata mine those datasets

@tim_coetzee IMO big data as the next frontier for MS research. There are huge clinical trial datasets that need to be analyzed

@tim_coetzee we estimate that 400K people have #MS in US and 2.1 million worldwide.

@tim_coetzee MRI is one datapoint. But now we also have information from clinical trials that needs more analysis.

@tim_coetzee don't have good ways to measure numbers of ppl w/ MS in US. Many believe its underreported

@tim_coetzee The nbrs seem to be steady. We are able to diagnose faster. We are seeing more kids with MS

@tim_coetzee Good question. We don't have good ways to measure numbers of ppl w/ MS in US; not trackd by CDC

David Smith, VP of Marketing at Revolution Analytics, discussed the role of open-source R tools in helping SUNY researchers do deep analysis of the factors and interactions that cause MS. He discussed how R and Netezza, used together, provide the horsepower to tackle the most challenging compute and data-intensive analytics:

@revodavid In research+acad, R is "lingua franca" of stat analysis. Some #'s here:

@revodavid Advanced research (like for #MS) requires latest analytical methods. Open-source lets researchers publish new methods faster

@revodavid W just 1000 factors, if need to test all pairs, thats a million tests to do. That's combinatorial explosion

@revodavid R gives plugability 4 probability distros. Netezza+R gives us speed in combinatorial data space from interactions

@revodavid Like difference btw 1 lab tech running 1000 tests in sequence, vs 100 lab techs all running tests at once.

The clear takeaway from this tweetchat was that progress in biomedical science absolutely depends on big data analytics. Without analytical power tools, such as those provided by IBM Netezza and Revolution Analytics, MS researchers can't spot the complex patterns that illuminate the causes of this dread disease. As Tim Coetzee summarized well (in less than 140 characters):

@tim_coetzee Collaborations like the one between @M_Ramanathan and IBM will help get us the cure.

For More Information:

Follow Jim On: