When Big Data Changed the Results of Medical Research

Prion protein expressed in E. coli, purified and fibrillised at pH 7. Caption and credit: niaid/Flickr, CC BY 2.0

Prion protein expressed in E. coli, purified and fibrillised at pH 7. Caption and credit: niaid/Flickr, CC BY 2.0

A patient trains herself as a scientist, and in her quest to understand a disease that has struck her family, makes a breakthrough with the help of an even more unprecedented scientific process.

In September 2013, an article appeared in The New Yorker about an email the author had received from a young couple who were on a quest to cure a disease similar to the one she’d written about in her book, The Family That Couldn’t Sleep. The book’s was an exceptional story of an Italian family that had a hereditary disease that caused insomnia and eventually to death around the age of 50.  The young couple were Eric and Sonia Vallabh Minikel, a software engineer and his wife, in Boston, Massachusetts.

In 2010, Sonia’s mother had developed a baffling illness. Starting with blurry vision, a spike in blood pressure, she became severely demented over the next few months. Sonia’s mother died from this disease in December 2010, at the age of 52. Shortly after, an autopsy report showed that she had a genetic disease called fatal familial insomnia (FFI). This is a rare prion disease that afflicts only one in a million people. Prion diseases, such as the infamous ‘mad-cow’ disease, are progressive neurodegenerative disorders caused due to improper folding of the prion protein. Most prion diseases are sporadic and non-genetic even as a few are caused as a result of a mutation in the prion gene (PRNP).

FFI is an autosomal-dominant genetic disease, which means that a person has a 50% chance of getting the disease if one parent has it. And as soon as she had found out about FFI, Sonia wanted to know if she had inherited the mutation. A test at a genetic lab confirmed it.

But rather than let herself become debilitated by the burden of this knowledge, Sonia and Eric set out to understand more about FFI. Fortunately, they lived in Boston, the Mecca of research institutes. They attended lectures and seminars on prion disease at the Massachusetts Institute of Technology; talked to scientists around the world; and worked in a neurogenetics lab at Massachusetts General Hospital and at Harvard University.

Later, they both quit their jobs and enrolled as full-time PhD students at the Broad Institute at Harvard. Their dedication and efforts slowly bore fruit and furthered the understanding of the disease.

Comprehending the probability and frequency of a mutation to cause disease is the first step toward genetic counselling and eventual treatment. In medical parlance, this is called penetrance, the proportion of individuals with the mutation who exhibit clinical symptoms; in other words – the chance of actually developing the disease. For example, if a mutation has 90% penetrance, then 90% of those with the mutation will develop the disease while 10% will not.

But in the case of rare mutations, it is difficult to establish the ability of a pathogen to cause disease simply because there isn’t enough data available about affected individuals or families as they are so few. But thanks to a unique process, Eric, Sonia and a team of scientists were able to determine penetrance for a few genetic variants of the prion disease, the results of which were published last week in the journal Science Translational Medicine.

They made use of a large repository of genetic information, the ExAC (Exome Aggregation Consortium) database, created through an international consortium and containing data of about 60,000 individuals as the population control-sets. The team compared it against 16,000 individuals affected by prion disease, whose genetic data was made available by various researchers (with the permission of those individuals). Later, they repeated the comparative exercise with a larger database: that of a company known as 23andMe, which contained the genotypes of 500,000 individuals.

First, they identified 63 rare genetic variants from the scientific literature that had been reported to cause prion disease. Then, they asked themselves whether these reportedly pathogenic variants were really as rare as expected in the population control-datasets.

They were able to clearly determine the penetrance of 10 of the 63 variants that they studied. Of these, four variants were pathogenic with almost 100% penetrance. Three other variants previously thought to be pathogenic with high penetrance were found to be likely benign. Three additional variants were identified as neither benign nor fully penetrant with estimates ranging from 0.1% to 10%.

Unfortunately, Sonia’s mutation that caused FFI was almost 100% penetrant, with a frequency of less than 1 in 100,000 in 23andMe’s database and being completely absent from the 60,000 individuals in the ExAC database.

But for some, the results of this research brought some good news. In a commentary to this paper, Dr. Robert Green, a geneticist at the Brigham & Women’s Hospital, at Harvard Medical School, wrote that thanks to the results reported, he’d had the pleasure of informing an individual, whose had mother died from prion disease, that the variant it had been attributed to was benign, not pathogenic as previously thought.

What next?

While combing the database, Eric and Sonia found three patients with mutations but who carried only one copy of the gene instead of the normal two. These patients were healthy, relative to the effects of prion disease, and one was in his late 70s even. So, the duo and their colleagues suspected the existence of genetic variants that could result in a shorter version of the protein. Such a protein in turn wouldn’t cause the disease despite its presence. Such variants were subsequently found (in the N-terminus region of the gene) and were confirmed to be non-pathogenic.

For Eric and Sonia, this finding offers hope to alter Sonia’s genetic destiny. It suggests that reducing the expression of the prion protein could delay the onset of the disease or even prevent it. All this was made possible only through the sharing of large quantities of data and there being an abundance of tools to process it – concepts encapsulated as Big Data.

In India, the Big Data Initiative @ CSA (Computer Science and Automation) at the Indian Institute of Science (IISc), Bengaluru, is one of the few initiatives in this field attempting to bring together researchers from different areas to work on new algorithms, analytics and systems for Big Data problems.

Vijay Natarajan, associate professor at the Department of Computational and Data Sciences at IISc, says, “The results reported in this paper are surely interesting and the data collection must have been a challenge. While the data size may be large for this particular domain, it is not clear if the size or complexity of the data introduced any challenges in the analytics.”

He adds that in India, Big Data will play a role in medicine, biology, and several other fields in the near future. Big Data research requires the confluence of researchers from diverse backgrounds and a key challenge, according to him, will be to bring together such a diverse group to focus on challenging but common problems in science.

  • Jo Chopra McGowan

    What a fascinating story! I just lost a good friend to Creutzfelt-Jakob’s Disease (the human form of Mad Cow), so this feels especially pertinent. This young couple’s quest reminds me of Lorenzo’s Oil. ( More power to them and may their research bear much fruit.