What We're Looking for When We're Looking for the Smallest Set of Genes Essential for Life

Many bacterial genomes carry genes essential to life as well as a lot of chaff. Given a bacterium of a certain type, can we determine the set of genes essential to its life?

There is a man named Craig Venter. He pursues biological research that is outrageous and which often sounds path-breaking and revolutionary, at least at first glance. His first claim to fame among the general populace was his announcement of a parallel, privately-funded human genome project, which envisioned patenting the sequences of human genes of relevance to various diseases. Fortunately, this sequence data was never buried under patent protection, but his extraordinary science – including the sequencing of the first bacterial genome in 1995 – nevertheless contributed immensely to the timely closure of the public human genome project.

Venter often makes it to the news. He did in 2010 when he ‘synthesised’, from scratch, the complete 1.1-million-letter-long genetic material of a bacterium belonging to the group called Mycoplasma, transplanted it into a cell of its related species and rebooted the recipient cell in such a way that it was controlled entirely by the DNA of the donor organism. This was a huge technical achievement and one that will inform futuristic efforts at gene therapy and genome engineering, but not necessarily one that significantly enhanced our understanding of the genetic basis of life.

Venter was in the news again a few weeks ago. His group had synthesised the ‘minimal bacterial genome’, transplanted it into a recipient cell and brought it to life. This involved first defining a ‘minimal’ genome, followed by more of the exceptionally high-order technical proficiency that aided his group’s 2010 work. The Verge said that Venter and his group had synthesised a genome of a size “less than the smallest-known naturally-occurring bacterial organism”, and that “the new genome may help provide crucial clues to scientists about which genes are necessary for life.” To the Forbes magazine, this work represented the creation, after two decades, of “synthetic bacteria with no extra genes”.

The hype generated by these quotes necessitate a deeper investigation of the concept of a “minimal” genome. Is there one minimal genome or are there many? Is the concept even valid? Let’s discuss what determines the content of a bacterial genome and trace some of the recent history of the study of minimal genomes. The reader should then be able to make an informed decision on whether the representation of Venter’s work in the popular press is hyperbole.

The bacteria’s environmental contexts

Bacteria are single-celled organisms, more numerous (by some distance) than all forms of macroscopic life put together. They are diverse. There are bacteria that play in hot springs, those that relax in cold deserts, those that sit on plant roots and make our life possible, and those that devastate human, animal and plant life. And many more that we cannot list here. This diversity encompasses a large variety of genetic material, with bacterial genome sizes ranging in size from less than 150 genes to well over 12,000 genes. Bacteria with very small genomes are symbionts that compulsorily require the support of other bacteria and/or a larger host to survive. On the other hand, those with larger genomes are cosmopolitan and lead rather complex lifestyles. Venter’s minimal bacterial genome has about 473 genes, which makes it among the smallest… but not quite the smallest bacterial genome known.

What determines the size of a bacterial genome? What does the genome contain? About 90% of any bacterial genome comprises genes, which encode proteins. Many of these proteins are involved in metabolism – which is the utilisation of nutrients to produce energy; to make small molecules that constitute the cell structure; and those that form part of things like DNA and protein. There are genes that actually help assemble these small molecules into larger cell structures such as the cell wall; those that help replicate DNA and orchestrate the division of a parent cell into two daughter cells; those that help synthesise proteins and RNA, which are often intermediates in protein synthesis beyond performing many other tasks of their own. There are proteins that help regulate these processes. And there are many proteins about which we know nothing, and which could be essential to life, something that Venter’s work reemphasises.

The building of a cell surface, the synthesis of RNA and protein and the replication of DNA and subsequent cellular reproduction are non-negotiable tasks. Every cell does these things and so should a ‘minimal’ cell. And to be able to do so, cells must utilise nutrients to produce energy. Thus, metabolism is equally non-negotiable. But what nutrients to use is a matter of choice and circumstance. These days cows eat a lot of cellulose present in our garbage bins and metabolise it to source energy; the ability of humans to metabolise cellulose is rather limited. A bacterium may live all its life in a single type of habitat, where it sees a single type of nutrient and little else; all that this bacterium needs to do to get by is to encode a few proteins that can convert this single nutrient into energy and other byproducts. Another bacterium may be a globe-trotter and will have to discover and use hundreds of different types of nutrients. To do so, these bacteria will require hundreds of proteins to channel the large variety of nutrients they encounter into energy-generating processes.

Thus, the metabolic component of a minimal cell will, by definition, depend on the environment in which it plies its trade. These environments could be abiotic or biotic, even in symbiosis with other bacteria and organisms belonging to other kingdoms of life, living as a community. And in light of the immense diversity of habitats on earth, this already begins to complicate matters when it comes to defining a minimal genome. Thus, there can be no one “minimal bacterial genome” but many such things, each conditioned to a specific circumstance. In fact, Venter himself acknowledges this in a quote published alongside the article in The Verge: “Every genome is context-specific, and depends on the chemicals in the environment available” to it. “There’s no such thing as a true minimal genome without context.”

Defining the minimum

Despite these obvious problems with defining minimal genomes, that this question has attracted the imagination of scientists and the public alike should surprise nobody. In the last twenty years, several attempts have been made to define the minimal bacterial genome.

One approach that has been taken is what is known as comparative genomics. We know the complete genomes of many bacteria. We should be able to compare them and find out what is common across these genomes. This set of genes should represent the minimal bacterial genome. Then may be one could get ambitious and find the minimal genome of all life.

The first two bacterial genomes sequenced were fairly small. One was a Mycoplasma, which contains one of the smallest bacterial genomes known to man. This has to be minimal, right? A comparison between these two genomes showed that nearly half the genes found in the small Mycoplasma genome were not to be found in the other genome. Surprise! As more and more newly sequenced genomes were added to the mix, the minimal genome became smaller and smaller. I would be surprised if there are more than a few tens of genes that are similar across all bacteria. And a synthetic bacterium will be hard-pressed to survive on these few genes. Thus, the minimal bacterial genome is quickly becoming small to the point of being farcical and non-existing.

Through comparative genomics, we learnt that there was a second complicating factor to determining minimal genomes (beyond its depending on context). Although processes like RNA and protein synthesis are essential to all life, the detailed nature of genes involved in these processes is not the same even across bacteria. Eugene Koonin, the famous evolutionary biologist and genomes scientist, said in the mid-late 1990s and early 2000s that defining a minimal genome will be complicated by “non-orthologous displacement”, by which the same processes are performed by proteins with sequences so distinct that they cannot be recognised as similar. Therefore, even within processes whose universal importance to life is not questionable, defining a universal genetic basis – in terms of a gene sequence – becomes problematic.

A philosophical route through biochemistry

Now, our goal is more modest. Many bacterial genomes carry genes essential to life but also a lot of chaff as a legacy of complicated evolutionary developments. Now, given a bacterium of a certain type, can we determine the set of genes essential to its life? Modern genome engineering techniques permit this. For a variety of bacteria, we can use both comparative genomics and ‘gene deletion’ techniques to determine the list of essential genes. We have already described what comparative genomics is, and this technique, when applied to a set of genomes derived from closely-related organisms can do a decent job of identifying genes essential to this branch in the tree of life. In addition, we can experimentally ‘delete’ segments of the chromosome to find out if the deletion affects the organism’s survival or not.

A lot of such work has gone into the model bacterium E. coli. We know that more than 300 genes, out of the 4,500-5,000 genes in its genome, are essential for its survival in the nutrient-rich conditions routinely used for its growth in the laboratory. These deletions are done one at a time and no one has successfully published an attempt to create a minimal E. coli genome with 300 genes that can survive in the lab. And deleting 4,000 non-essential genes together is no easy task. The minimal E. coli genome can probably be synthesised from scratch using the method developed by Venter and his colleagues, but whether it is worth the effort is questionable.

Attempts have been made to delete as many as 15-20% of E. coli’s genome, and this toward-minimal E. coli does rather well in the lab. That is not to say that this engineered E. coli will do well in any of the many other circumstances that its natural parent faces in its normal life. For example, a similar analysis in the single-celled yeast (not a bacterium), but performed across hundreds of different growth environments, showed that nearly all genes were essential for survival in at least one condition. Take that!

Venter’s effort extends these by beginning with the already-small genome of a Mycoplasma, determining what is essential and what is not using a variety of techniques, and then synthesising a genome containing only those genes deemed to be essential. That it should be called “the minimal bacterial genome”, “crucial” to defining genes “necessary for life”,  is, in my opinion, reading too much into what is otherwise a really nice piece of work.

In summary, whether there is a minimal bacterial genome, and how it should be defined, is as much a philosophical question as it is one of interest to geneticists. Any answer that is found must take an enlightened view of the diversity of environments that bacteria are found in, including in tight cooperation with compatriots and other hosts, besides an understanding of the evolutionary processes that discover novel genetic solutions to the same biochemical problem.

Aswin Sai Narain Seshasayee runs a laboratory researching bacterial biology at the National Centre for Biological Sciences, Bengaluru. Beyond science, his interests are in classical art music and history.