Where Do Clinical Trials Go After They Are Completed?

Ph.D., Big Data Solutions, Healthcare and Life Sciences

I’ve been trying to delve into a topic I normally avoid—clinical trials. But it’s not about how clinical trials are run, or how to design one. I’m more interested in what happens to clinical trials data once they are completed.

Do folks actually mine their clinical trial archive for insights?

As I’ve been told, for most pharma, each clinical trial is a different data set—differing in formats, structures, data models. That means there is no easy way to analyze across trials—the data has to be manually normalized and collated. Not fun.

For example, I heard that two large pharmas merged and decided not to analyze the other's past clinical trials for new insights (or even duplication of findings).

laboratory-testing.jpgFurthermore, the data sets are thin, very focused on being the minimal set of data to answer a question (the reason why is a story for another day). So it could be that even if you put a bunch together, the data might not amount to much anyway.


This seems like a big data problem to me. Hence, I’ve been wondering if we could build a tool based on some of our big data products to do semantic analysis on sets of clinical trials, perhaps augmented with other data to fill out the data set thinness. This solution would essentially automatically discover patterns in the data and reveal them in a visually helpful way. The solution would find relevant patterns across trials that can help trials fail faster, be better designed, or yield additional hidden insights.

What do you think?

Reality check

I've been trying to validate this issue. For sure, I haven’t heard of folks mining their clinical trials data archive for insights—either to make sure they are not duplicating trials, or to better design or guide current or future trials.

But is this because it's hard to find relevance across clinical trials, or is there no relevance to be found, or does pharma not give a darn about old clinical trials (easier to look forward than back)?

Absence of proof doesn’t mean proof of absence.

What complicates matters, as I understand it, most pharma usually farm out clinical trials to a contract organization (indeed, one pharma I know doesn’t even have the archived trials in-house). And the ones that might have the data are usually organized such that there is no interest in cross-project analysis.

I have, indeed, started compiling a list of why this won’t work. But, being an interesting big data problem, it’s worth exploring.


Do you think this is something that pharma might want? While this is in theory a big data problem, is this in practice something that wants to be solved?

I think there are organizations out there that have clinical trials data archives and want to mine them for insights. I just haven’t met them yet.

Have you?

Photo: U.S. Environmental Protection Agency