Mega-analysis: Big data comes to the aid of medicine

In Uncategorized by Michael WillisLeave a Comment

David S. Seres, MD is Associate Professor of Medicine and a Clinical Ethicist at Columbia University Medical Center.

David S. Seres, MD is Associate Professor of Medicine and a Clinical Ethicist at Columbia University Medical Center.

Is sodium restriction actually bad for you? (Probably)  Should we or shouldn’t we do screening mammograms (unsure) and colonoscopies (probably)?  Are the benefits of Alzheimer’s drugs worth the side effects and costs (iffy)?  Do vaccines cause autism (emphatically, no)?  Is eating steak destroying Western civilization (depends who you ask)?  All of these are familiar tropes in the press and are some of the better-known examples of how we in medicine are constantly faced with life and death decisions in a vacuum of usable data.  All of these have been the subject of multitudes of studies that more often than not disagree.  Equally important is the question that personalized medicine is attempting to address:  Does a finding in a particular population in a particular study apply to any given individual?

The gold standard for determining cause and efficacy is the randomized control trial (RCT).  These, especially when the effect size is small or the control questionably ethical, are often impractical.

In my specialty, for example, to rigorously determine whether a particular nutrient mix will result in a reduction of infections in the ICU, it would take thousands of subjects being randomized to one of three arms (groups): experimental treatment, comparative treatment, and no feeding.  First, who will fund the tens of millions of dollars such a study would entail?  Second, not feeding sick people would be required to prove effect, but is difficult for an IRB to swallow (pun intended).  So we’re left unsure what to feed our sickest patients.

Because good research is hard to do, we have literally millions of non-rigorous studies published in the medical literature and on which we depend for practice and public health policy.  In these, work-arounds, pseudo-randomizations, and observations are the norm, and the individual clinician is left to wade through the scattershot literature to try to conclude what is best for their patient.

Recognizing the difficulties in weighing the published results, meta-analysis was developed as a way to quantify effect by creating a stringent statistical methodology for combining the data from multiple to allow the data to be analyzed as a whole.  As with any such re-analysis, meta-analysis is not considered definitive proof.  But it gets us a lot closer to objective, statistical validity.

Meta-analysis is itself quite limited.  A comprehensive meta-analysis is an extremely time consuming process. Until recently, the largest published meta-analysis included 95 studies (Tran & Weltman, 1985) and was published in JAMA. The digestion and analysis took 3 years to perform, with, as is typical, about another year elapsing during the publication process.  Unfortunately, the current trend in meta-analytics is towards performing smaller analyses, which results in “mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses”. (Ioannidis, 2016)  A recent survey of meta-analyses of the Cochrane Database of Systematic Reviews, the most respected repository of such reviews, showed that 75% of meta-analyses included fewer than 7 studies (Davey et al., 2011).

Over the years, our computational capacity has grown unfathomably.  When I was in training, the highest powered computers at the time took up an entire city block of space and their processing capacity was measured in the range of 32 bits with clock speed in megahertz.  I now carry a cell phone with capacity measured in megabytes (one megabyte is 8 million bits) and clock speed in gigahertz.  That’s a one with a whole lot of zeros after it faster.  Unimaginable quantities of data are being processed to perform genomics, microbiomics, and metabolomics, and mysteries such as the makeup of the bacteria in the gut are being unlocked by looking at the myriad DNA sequences of the hundreds of thousands of different individual species of bacteria inhabiting the colon.

With this kind of processing power, several companies have turned their attention to the medical literature, in an attempt to boil it down and provide more definitive answers for practitioners and the public.  Medaware Systems has emerged as the leader.  Founded by Zung Vu Tran, PhD, the protégé of the inventor of the meta-analysis and now a senior scientist in the field, Medaware Systems is using research scientists to digest every study published in the medical literature, each into hundreds of variables. This allows for meta-analysis at a level of granularity never before seen.  Even IBM’s Watson® cannot come close to the level of specific data extraction that Medaware Systems has achieved. Now, Medaware Systems is well underway in the process of digestion of the entire medical literature.

Medaware Systems has recently completed a meta-analysis 2000 times more complex than any previously done.  Not only has Medaware Systems increased the complexity of data extraction, but the speed with which Medaware Systems is already performing analyses is breath taking.   Recently, this, the largest meta-analysis known, was performed on 1734 studies, collecting 400 variables for an estimated 12 million data points.  Truly a mega-analysis.  Medaware Systems digested the data and performed the analysis in 2 ½ months.  Compared to the prior 95 studies and 5 variables, that’s 100,000 times faster per data point.

Even the best doctors are making decisions in a vacuum.  In one study at a major academic center, only 60% of decisions were evidence-based (Prasad et al., 2013).  Medaware Systems’ driving goal is to democratize the availability of large data analyses by making them available instantly to subscribers.  Once all of the literature has been digested, meta-analysis of any treatment will be available in nanoseconds with the touch of a few buttons. Medical providers, and eventually patients, will be able to access a digest of the entire medical literature with user-friendly desktop applications.

While the overall digestion is ongoing, the company is performing specified analyses, such as the mega-analysis above.  They have already demonstrated the efficacy of a medical device for an FDA application.  In fact, they were able to educate the company on which populations the device was most effective; something about which the company was not aware.  They have done large analyses for pharma and the nutrition industry, proving on the one hand that an entire class of drugs were ineffective and that information on a drug insert was incorrect, and on the other that there was indeed support for claims of superiority for a feeding product.

To be clear, mega analyses is a powerful yet distinct tool from RCTs.  Mega analytics both complements and compensates for things that RCTs cannot do, including assessing the effect of a treatment over tens of thousands of patients in different populations at a very low cost. Further, the results of an RCT are truly only valid in the exact population that was studied.  Mega-analytics at the bedside, via collection of outcomes data from electronic medical records for huge numbers of patients, can provide badly needed external validation of internally validated RCTs with real world evidence.  A significant result in an RCT does not mean the same effect will be seen in a different population of patients with a different genetic, genomic or epigenetic make-up, or in a different environment in a different part of the world.  Medaware Systems is also adding this function to their prodigious data analytic capacity.

With mega-analytics, efforts at personalized medicine are far more likely to succeed and to allow us to achieve precise application of treatments to the individual patient, maximizing benefit and minimizing risk.


Tran ZV, Weltman A. 1985. Differential effects of exercise on serum lipid and lipoprotein levels seen with changes in body weight: A meta-analysis. JAMA 254(7):919- 924.

Davey J, Turner RM, Clarke MJ, Higgins JPT. Characteristics of meta-analyses and their component studies in the Cochrane Database of Systematic Reviews: a cross-sectional, descriptive analysis. BMC Medical Research Methodology 2011, 11:160. 

Ioannidis JPA. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. The Milbank Quarterly 2016; 94(3):485-514.

Prasad V, Vandross A, Toomey C, Cheung M, Rho J, Quinn S, Chacko SJ, Borkar D, Gall V, Selvaraj S, Ho N, Cifu A. A Decade of Reversal: An Analysis of 146 Contradicted Medical Practices. Mayo Clin Proc.  August 2013;88(8):790-798


Please follow and like us: