Big Data for Complex Disorders: A Case in Point for Schizophrenia

The scientists concluded that patients with schizophrenia were more likely to have one particular form of a gene, and the risk of schizophrenia goes higher with the inheritance of the same form of that gene.

Big Data has done it once again (after doing it here). A recent study on schizophrenia provides a solid clue toward solving the long-standing problem of defining causality in complex and multi-factorial human diseases. What is so unique about schizophrenia and what exactly do the recent results teach us about the role of genetics in complex human disorders? Let’s examine them one by one.

Before going into the role of genetics in complex disorders like schizophrenia, let’s understand how scientists find links between genes and complex diseases. For the better part of the last decade, scientists have been using a process called genome-wide association studies (GWAS) to find how some genes contribute to the development of some common diseases like diabetes. This includes studying common variants across the genomes in two groups of individuals, one healthy (called the controls) and the other affected by the disease (cases).

So, such GWAS studies are called case-control studies. By looking at the association between an allele (different forms of a gene) or a group of alleles and the disease trait in question, scientists assign an odds ratio in GWAS studies. The ratio is a measure of strength of how strongly the defined allele is associated with the disease trait in a given population. The second type of GWAS studies is family-based. Both these types have their advantages and disadvantages. For example, family-based designs allow for testing for the effect of imprinted genes and whether the variant is new or inherited.

Next: Schizophrenia, a devastating psychiatric disorder with high heritability, is characterised by delusional thinking and is a geneticists’ nightmare. Why? First, it’s multi- or poly-genic, meaning its characteristic is specified by a combination of multiple genes. Even with previous GWAS studies linking 108 loci to the disease, assigning specific contributions from the genes in those loci (a specific location in the DNA sequence) to risk of schizophrenia have been challenging. Second, many of the schizophrenia-associated variants discovered so far have been in non-coding regions of the genome, making functional associations difficult. Third, every single allele or gene identified so far contributes to a small change on disease risk.

Therefore, despite a sizeable body of scientific literature, a causal relationship between any particular gene and pathogenicity in schizophrenia has not yet been established. Here is where the results from the recent study demonstrate the power of big data coupled with biology in demonstrating specific roles of the genes involved in complex human diseases.

Large-scale GWAS studies in the past have identified the human major histocompatibility complex (MHC) locus in chromosome 6 as one of the strong contenders of association with the disease. The MHC comprises a group of genes coding for cell structure proteins that help the body’s immune system recognise incoming antigens, hence determining compatibility. In the present study, Dr. Steven McCarroll of Harvard Medical School, his students and his collaborators analysed the genomes of nearly 65,000 people (35,986 healthy and 28,799 with the disease) from 22 countries and in 40 cohorts – to find a causal association between a specific form of a gene called C4A (complement component 4 A) and synaptic pruning. This is the process by which the brain systematically eliminates or removes synapses that are less active.

The McCarroll group’s is a significant discovery for many reasons. The human complement factor C4 helps the immune cells in the process of targeting and removing pathogens. In other words, C4, one of many complement proteins in the body and a part of body’s innate immune system, helps clear cellular debris upon receiving specific signals from the immune system. It now appears that a specific form of the same protein C4 has an additional function in the brain, where it collaborates with other cells. McCarroll and his team studied the role of the C4 gene within the MHC complex and tried associating the gene’s function to the risk of schizophrenia.

There are two functionally different forms of C4 genes, C4A and C4B, each with a long and a short form (C4A-long or C4AL and C4S-short or C4AS; C4B-long or C4BL and C4B-short or C4BS).

The researchers found in 674 different samples from 245 post-mortem adult donor brain tissues that the expression of C4A and C4B genes were proportional to their copy numbers, and that C4 is expressed at 2-3x higher levels than C4B even after adjusting for the relative copy number differences between the two genes. They also found that C4A is expressed at 1.4x higher level than C4B in the brains of 35 schizophrenia patients compared to the healthy controls.

In conclusion, the scientists found associations between the four common haplotype structures (genetic variations inherited together) encompassing the C4 locus, AL–BL, AL–BS, AL–AL, and BS, and schizophrenia risk.

By performing immunohistochemistry tests in specific sections of the brain tissue and with co-localisation experiments, they showed that the C4 protein is expressed by neurons and is localised at different parts, namely dendrites, axons and synapses. Then, the researchers used a mouse model to show a correlation between the C4 protein and synaptic pruning or elimination, a classical feature that has been earlier shown to be associated with schizophrenia.

The scientists concluded that patients with schizophrenia were more likely to have one particular form of the C4 gene and the risk of schizophrenia goes higher with the inheritance of the same form of the C4 gene. The current study is a major step towards understanding of the biological role of any gene in schizophrenia as it establishes a causal association between a gene-function and synaptic pruning.

The study is unique as it demonstrates how a complex event linked with gene expression can ultimately provide clues to explain a niche biological function. It’s also significant as it establishes the importance and utility of large datasets. Third, it reiterates the spirit of collaboration and data sharing as exemplified by the involvement of various stakeholders of the Psychiatric Genomics Consortium, which provided the data. Unbiased genome-wide studies can lead into unchartered territories and unexpected findings. And linking biology with data is the key to solving important questions in biology and medicine. We need to be ready to produce, mine and understand large data in that context.

Binay Panda is at Ganit Labs, Bengaluru. Besides genes and genomes, Binay is passionate about open science, education and free knowledge dissemination.