Genetic Databases Are Leaving Marginalised People out of Their Data

Calculating the risk for schizophrenia when using UK Biobank data is only accurate for white European populations – leaving, for example, Indians at a disadvantage.

Imagine this: you are a cash-strapped early-career health scientist, looking for your next big project. One day, you get your big break — the chance to study half a million people, and the freedom to focus on virtually any topic you like, from DNA mutations to blue cheese intake. Best of all, this study will cost you virtually nothing.

It’s easy to imagine that organisations like the UK Biobank make anything possible. Biobanks are huge repositories containing health, genetic, and demographic information from volunteers. Researchers look through the vast amount of data to find new health patterns and trends. There are few limits: you can analyse scans of volunteers’ hearts, infer their sexual behaviours, or study their reasoning skills.

Over 850 UK Biobank papers have been published, with new studies appearing in journals constantly. Studies so far have found results which could improve global health, such as a study showing that anyone, regardless of their genetic background, can reduce their risk of dementia with a change in lifestyle.

However, as promising as biobanks might seem, the data may tell only partial or even misleading stories.

Criticisms of the project include that the research coming out of the UK Biobank will only benefit certain people, and even then, the usefulness of the health associations found are under question.

Compared to the 2011 UK census, Black, Indian, Pakistani and Chinese participants are all underrepresented in the Biobank by at least one third. David Curtis, at University College London, tested whether this under-representation of ethnic minority groups has any impact on schizophrenia genetics research.

Also read: How Do We Stop Genetic Medicine From Perpetuating Inequality?

He found that calculating the risk for schizophrenia when using Biobank data is only accurate for white European populations. This means that in the future, white people could be offered genetic tests for certain health conditions, while other people could be offered incorrect or no testing at all.

This is because of the complex evolutionary history of humans. While humans who migrated out of Africa and settled in Europe faced bottlenecks where their genetic diversity was reduced dramatically, Africans have maintained large and diverse populations, and so have a more unique genetic makeup.

Other researchers are investigating the Biobank’s data as well. Na Cai, a statistical geneticist at the Wellcome Trust Sanger Institute and European Bioinformatics Institute, began thinking about how what gets put into the Biobank affects what conclusions come out of it, similar to Curtis’ study on schizophrenia.

In her study, currently a pre-print posted on bioRxiv, Cai and colleagues decided to focus on major depressive disorder. Depression is one of the most common mental health disorders, and has been a major topic of investigation in genetic association studies.

Because of this, Cai was concerned that researchers might not be investigating depression specifically, but instead looking at the genetics of poor mental health in general.

Cai defined depression in five different ways, using both strict and loose criteria. For example, some people might tell their doctor that they feel depressed, but not meet the specific psychiatric definition of major depressive disorder. She looked to see if the same genetic variants were associated with each different definition of depression.

The results were surprising. She found less of a genetic contribution towards all the “looser” definitions of depression compared to the full assessment used by psychiatrists.

Also read: DNA Sequencing Is Inadvertently Exacerbating Social Biases and Inequalities

First, it shows that researchers do not have the power in their studies that they assume they do. Previously, it was assumed that it didn’t matter too much if researchers defined depression loosely. It could be that these broader definitions are just milder cases of depression, or show less of a genetic association because more people in these groups are misdiagnosed, which dilutes the signal.

However, when the researchers controlled for these factors, nothing changed. The strict psychiatric definition of depression was still genetically distinct from these other versions, meaning that it had more genes associated with it, and there wasn’t much overlap in the genes which all the definitions did share.

A technician works at a genetic testing laboratory in China. Photo: Reuters

This throws into question whether papers which have found links between depression and genes are coming to the right conclusions. Are they finding a genetic basis for major depressive disorder, or are they showing something else — like the less specific genetic basis for poor mental health in general?

Both Cai and Curtis conclude that we need to rethink how we collect biobank data. Both issues are the result of design flaws present since the UK Biobank’s inception. Cai does not necessarily think all participants need to be assessed by a psychiatrist. She suggests that we use new technologies, such as computer assessments and smartphone behavioral tracking, to diagnose people with clinical depression.

But tackling the lack of diversity in biobank data requires those in charge to recognise that the current design excludes marginalised and hard-to-reach groups.

John Savill, the Chief Executive of the UK Medical Research Council, the organisation which provided major funding for the Biobank, was reported by the Guardian to say in response to Curtis’ research that “I do not think it is helpful to cast concerns over experimental design as ‘equalities issues’”.

Also read: Widely-Available Genetic Risk Tests Aren’t Always Useful – and Could Even be Harmful

However, David Heel, who is the Chief Investigator of the East London Genes & Health Project, which aims to improve the health of South Asian people in the UK, thinks that the UK Biobank’s recruitment tactic of mailing a letter meant British-Bangladeshi and British-Pakistani people missed out. When reached via email, Heel said that, in regards to volunteers in the project, “A much better response rate comes from a face to face discussion,” or “a trusted setting” such as talking at a doctor’s office.

Curtis also thinks more can be done, but is not optimistic that we can save the UK Biobank from this bias. He said “It may be too late to try to make the UK Biobank more representative. We may need to look to other initiativesand to look to samples recruited in other countries.”

The article was originally published on Massive Science. You can read it here.