Healthcare and big data: Drive insights, optimize costs, innovate care

President, Sixth Sense Advisors Inc.

In the past few years, a significant debate has emerged around healthcare and its costs. There are almost 80 million Baby Boomers approaching retirement, and economists forecast this trend will likely bankrupt Medicare and Medicaid in the near future. While healthcare reform and its new laws have ignited a number of important changes, the core issues are not resolved. It’s critical we fix our system now, or else our $2.6 trillion in annual healthcare spending will grow to $4.6 trillion by 2020—one-fifth of our gross domestic product (GDP).

Data-rich and information-poor has always been data-rich. Medicine has developed so quickly in the past 30 years that along with preventive and diagnostic developments, we have generated a lot of data: clinical trials, doctors’ notes, patient therapies, pharmacists’ notes, medical literature and, most importantly, structured analysis of the data sets in analytical models. 

On the payer side, while insurance rates are skyrocketing, insurance companies are trying hard to vie for wallet share. However, you can’t ignore the strong influence of social media. 

On the provider side, the small number of physicians and specialists available versus the growing need for them is becoming a larger problem. Additionally, obtaining second and third expert opinions for any situation to avoid medical malpractice lawsuits has created a need for sharing knowledge and seeking advice. At the same time, however, there are several laws being passed to protect patient privacy and data security.

On the therapy side, there are several smart machines capable of sending readings to multiple receivers, including doctors’ mobile phones. We have become successful in reducing or eliminating latencies and have many treatment alternatives, but we do not know where best to apply them. Treatments that can work well for some do not work well for others. We do not have statistics that can point to successful interventions, show which patients benefited from them, or predict how and where to apply them in a suggestion or recommendation to a physician.

 There is a lot of data available, but not all of it is being harnessed into powerful information. Clearly, healthcare remains one of our nation’s data-rich, yet information-poor industries. It is clear that we must start producing better information, at a faster rate and on a larger scale. 

Before cost reductions and meaningful improvements in outcomes can be delivered, relevant information is necessary. The challenge is that while the data is available today, the systems to harness it have not been available.

Big data and healthcare

Big data is information that is both traditionally available (doctors’ notes, clinical trials, insurance claims data, drug information), plus new data generated from social media, forums and hosted sites (for example, WebMD) along with machine data. In healthcare, there are three characteristics of big data: 

  • Volume: The data sizes are varied and range from megabytes to multiple terabytes
  • Velocity: The data production by machines, doctors’ notes, nurses’ notes and clinical trials are all produced at different speeds and are highly unpredictable
  • Variety: The data is available or produced in a variety of formats but not all formats are based on similar standards

Over the past five years, there have been a number of technology innovations to handle Web 2.0-based data environments, including Hadoop, NoSQL, data warehouse appliances (iteration 3.0 and more) and columnar databases. There are several analytical models that have become available, and late last year the Apache Software Foundation released a collection of statistical algorithms called Mahout. With so many innovations, the potential is there to create a powerful information processing architecture that will address multiple issues that face data processing in healthcare today:

  • Solving complexity
  • Reducing latencies
  • Agile analytics
  • Scalable and available systems
  • Usefulness (getting the right information to the right resource at the right time)
  • Improving collaboration

Potential solutions

How can big data solutions fix healthcare? A prototype solution flow is shown here. While this is not a complete production system flow, there are several organizations working on such models in small and large environments.

Figure 1: Prototype Solution Flow

An integrated system can intelligently harness different types of data using architectures like those of Facebook or Amazon to create a scalable solution. Using a textual processing engine like FRT Textual ETL (extract, transform, load) enables small and medium enterprises to write business rules in English. The textual data, images and video data can be processed using any of the open source foundation tools. Data output from all these integrated processors will produce a rich data set and also generate an enriched column-value pair output. We can use the output along with existing enterprise data warehouse (EDW) and analytical platforms to create a strong set of models utilizing analytical tools and leveraging Mahout algorithms. 

Using metadata-based integration of data and creating different types of solutions—including evidence-based statistics, clinical trial versus clinical diagnosis types of insights, patient dashboards for disease state management based on machine output and so on—lets us generate information that is rich, auditable and reliable. This information can be used to provide better care, reduce errors and create more confidence in sharing data with physicians in a social media outlet, thus providing more insights and opportunities. We can convert research notes from doctors that have been dormant into usable data, and create a global search database that will provide more collaboration and offer possibilities to share genomic therapy research. 

When we can provide better cures and improve the quality of care, we can manage patient health in a more agile manner. Such a solution will be a huge step in reducing healthcare costs and fixing a broken system. 

Eventually, this integrated data can also provide lineage into producing patient auditing systems based on insurance claims, Medicaid and Medicare. It will also help isolate fraud, which can be a large revenue drain, and will create the ability to predict population-based spending based on disease information from each state. Additionally, integrated data will help drive metrics and goals to improve efficiency and ratios.

While all of these are lofty goals, big data-based solution approaches will help create a foundational step toward solving the healthcare crisis. There are several issues to confront in the data space, such as quality of data, governance, electronic health record (EHR) implementation, compliance, safety and regulatory reporting. Following an open source type of approach, if a consortium can be formed to work with the U.S. Department of Health and Human Services, a lot of associated bureaucracy can be minimized. More vendor-led solution developments from the private and public sectors will help spur unified platforms that can be leveraged to create this blueprint.


While big data cannot fix healthcare on its own, it can provide the foundational platform for creating a holistic solution. On a personal note, my team presented a health consortium with a feasible solution. Perhaps in the future, we will have a global health platform where we can solve much more than costs for healthcare.

This article is published in association with the BIG DATA SEMINAR 2015 featuring Krish Krishnan. This event is an intensive, two-day tutorial about big data sponsored by Data Management Forum at the Hotel Pennsylvania in New York City, September 17–18, 2015.