Can Intelligently Harnessing Big Data Fix Healthcare?

How the healthcare industry is well poised for turning big data into powerful information to enhance insight and care

President, Sixth Sense Advisors Inc.

In the past few years, a significant debate has emerged in regard to healthcare and its cost. Approximately 80 million baby boomers are approaching retirement,1 and some economists forecast this trend may likely bankrupt Medicare and Medicaid in the near future. While healthcare reform and its new laws ignite a number of important changes, the core issues remain unresolved. Fixing the healthcare system now is critical.

Healthcare spending in the US is projected to grow at an annual average rate of 5.8 percent for the period 2010 through 2020. In comparison, that rate of growth is 1.1 percentage points faster than the expected growth in the gross domestic product (GDP). Further, Healthcare spending by 2020 is estimated to be 19.8 percent of GDP, increasing from 17.6 percent in 2010. Total healthcare spending is projected to reach USD4.64 trillion in 2020, close to half of which will come from government sources.2

Data rich and information poor

Healthcare has always been data rich. Medicinal advancements have developed so quickly in the past 30 years that along with preventive and diagnostic approaches, it has generated a lot of data. This tremendous volume of information includes clinical trials, medical literature, notes taken by doctors and pharmacists, patient therapies, and most importantly, structured analysis of the data sets in analytical models.

On the payer side, while insurance rates are skyrocketing, insurance companies are trying hard to vie for wallet share—and it’s interesting to observe the strong influence of social media. On the provider side, the small amount of physicians and specialists available versus the growing need for them is becoming a huge problem. Additionally, obtaining second and third expert opinions for any situation to avoid medical malpractice lawsuits has created a need for sharing knowledge and seeking advice. At the same time, however, there are several privacy laws being passed to protect patient privacy and data security.

On the therapy side, there are several smart machines capable of sending readings to multiple receivers, including doctors’ mobile phones. The healthcare profession has become successful in reducing or eliminating latencies and has many treatment alternatives, but it doesn’t always know where best to apply them. Treatments that work well for some may not work well for others. Statistics that can point to successful interventions, and where and for whom they worked, or that predict how and where to apply them in a suggestion or recommendation to a physician, are not available.

A lot of data is available, but not all this data is being harnessed into powerful information. Clearly, healthcare remains one of the nation’s data-rich, yet information-poor, industries. Obviously, it needs to start producing enhanced information at a faster rate and on a larger scale than ever before.

Nevertheless, meaningful information is required before healthcare can reduce costs and deliver significant improvements in outcomes. The challenge is, while the data is available today, the systems that need to harness the data have not been available.

Big data and healthcare

Big data is information that is both traditionally available—clinical trials, drug information, insurance claims data, and notes by doctors—and new data generated from forums, machine data, social media, and hosted sites—for example, WebMD. Big data is notoriously characterized by volume, variety, and velocity. In this context, volume applies to dissimilar data sizes ranging from megabytes to multiple terabytes. Variety means data available or produced in a range of formats, but not all formats are based on similar standards. And velocity refers to highly unpredictable data production by machines, from notes by doctors and nurses and from clinical trials—all generated at different speeds.

Over the past five years, a number of technology innovations have emerged to handle the Web 2.0–based data environments, including Apache Hadoop; columnar databases; data warehouse appliances, iteration 3.0 and higher; and NoSQL. There are several analytical models that have been made available, and in early 2014 the Apache Software Foundation released Apache Mahout, a collection of statistical algorithms.

With so many cool innovations, organizations can definitely create a powerful information processing architecture that can address multiple data processing problems that healthcare faces today. This approach includes agile analytics, enhanced collaboration, reduced latencies, scalable and available systems, minimized complexity, and usefulness—that is, making the right information available to the right resource at the right time.

Production system flow

How can big data solutions fix the healthcare situation? Several organizations are working on prototype solution flows (see figure). While the diagram shown is not a complete production system flow, there are such models available in small and large footprints.

Can Intelligently Harnessing Big Data Fix Healthcare? – figure

Partial prototype solution flow system

In an integrated system, different types of data can be intelligently harnessed using architectures like those used for Facebook or Amazon to create a scalable solution. Using a textual processing engine such as Forest Rim Technology’s textual extract-transform-load (ETL) tool enables small and medium enterprises to write business rules in English. The image, textual, and video data can be processed using any open source foundation tools. Data output from all these integrated processors produces a rich data set and also generates an enriched column-value pair output. This output, along with existing enterprise data warehouse (EDW) and analytical platforms, can be used to generate a strong set of models utilizing analytical tools and leveraging Mahout algorithms.

When metadata-based data integration is applied, and different types of solutions—evidence-based statistics, insights derived from clinical trials versus clinical diagnoses, patient dashboards for disease state management based on machine output—are produced, rich, auditable, and reliable information is obtainable. This information can be used to provide enhanced care, reduced errors, and elevated confidence in sharing data with physicians in a social media outlet, which supplies increased insights and opportunities. Research notes from doctors that have been dormant can be converted into real data, and can act as the foundation of a global search database that provides collaboration and possibilities for sharing research in gene therapies for several diseases.

Providing enhanced cures and improving the quality of care enables the management of patient health in an agile manner. Such a solution is expected to be a huge step in helping reduce healthcare costs and fixing a dysfunctional system.

Eventually, this integrated data can also provide a way to create auditing systems for patients based on insurance claims, Medicaid, and Medicare. It can also help isolate fraud, which is a big revenue leak, and offer the ability to predict required population-based spend predicated on disease information from each state. Additionally, integrated data is expected to help drive metrics and goals to help improve efficiency and ratios.

Holistic foundational platforms

While all these goals may be lofty, big data–based approaches can help create a foundational step toward solving the healthcare crisis. There are several issues to confront in the data space, such as compliance, data quality, electronic health record (EHR) implementation, governance, regulatory reporting, and safety. Following an open source type of approach, if a consortium can be formed to tackle these problems at the US Department of Health and Human Services, a lot of associated bureaucracy may be minimized. Increased vendor-led solution developments from the private and public sectors can create unified platforms that may be leveraged to develop this blueprint.

While big data cannot fix healthcare on its own, it can provide a basis for creating a holistic solution. In my personal experience, my team presented a health consortium with a feasible alternative. Perhaps in the future, we will have a global health platform on which much more can be solved than just high costs for healthcare.

Please share any thoughts or questions in the comments.

1Baby Boomers – A Healthcare Crisis Nears,” by Heath Atchison, Healthcare, Note: The approximate number of US baby boomers—American citizens born in the period July 1946 to 1964—in the US is based on the approximate 28 percent—cited in this source—of the US total population in 2013, which was approximately 316.1 million.
2US Health Spending Projected To Grow 5.8 Percent Annually, by Chris Fleming, Health Affairs blog, July, 2011.

Editor’s note: This article by Krish Krishnan, president and chief executive officer (CEO) at Sixth Sense Advisors, Inc., is offered for publication in association with, The Big Data Seminar 2015, March 12–13, 2015 in New York, New York, sponsored by Data Management Forum.