We  analyze the contribution of “comparative genomics”, a branch of genomics that compares the sequences of different organisms in order to deduce their phylogenetic relationships.

Biology has revealed that all living beings descend from a single organism, the “cenancestor”, something similar to a primitive prokaryote that, once formed, became diversified by the slow progressive accumulation of modifications in its incipient genome. After its constitution, evolution would begin based on changes in the primitive genome, its diversification and selection in the different environments offered by our planet for more than 3.5 billion years. Starting with Darwin (1809-1882), we know that diversity is the source from which evolution thrives. The Ukrainian-American geneticist Theodosius Dobzhansky (1900-1975) said that “nothing in Biology makes sense except in the light of evolution” [1], to which his most prominent disciple, the Spaniard Francisco José Ayala (1934-2023), added that “everything in evolution makes sense in the light of Genetics” [2]. Perhaps more than twenty years after completion of the human genome project and the sequencing of many other species, we are in a position to say that evolution holds its secrets in the comparison of genomes.

The main driver of evolution is the mutations that generate genetic variation in the broad sense, and are produced by various mechanisms in the genome of living beings. The most basic mechanism is genetic mutations consisting of changes, losses or gains of nucleotide bases in the DNA of genes, or other larger modifications that affect the chromosomes and that are transmitted vertically from parents to children. In higher organisms, these mutations recombine and propagate through sexual reproduction and appear in future generations; they may or may not be revealed, and may be selected depending on their phenotypic effects and reproductive capacity in the environment in which their carriers develop. In lower beings, such as bacteria and other prokaryotes in which sexual reproduction does not exist, mutations are joined by genetic variation due to “horizontal transfer”. This consists of a series of parasexual mechanisms through which DNA from other organisms is incorporated into the genome of an organism by various means: “plasmids” (DNA rings that are transferred from one bacterium to another through cytoplasmic bridges); “bacteriophage” viruses (which infect bacteria and transduce fragments from one to another); or “transformation” (uptake of DNA regions from various sources in the environment in which they live).

Horizontal transfer is the main source of variation in bacteria and other prokaryotes with regard to evolution. Over the generations, regions of DNA are added or removed from the initial core genome, new genes from other organisms, whose phenotypic effects are subjected to the filter of natural selection, as is the case with genetic variation from recombination in sexually reproducing beings. Of the genes, those that help to better adapt to the environment will be selected for, while those that do not will continue drifting or will be eliminated depending on stochastic factors and whether their effects are neutral or harmful to their carriers. The classic example — equally valid for bacteria or higher organisms — is the incorporation of genes for resistance to adverse agents, such as a gene that confers resistance to an antibiotic in a bacterium or a gene for resistance to mycosis or bacteriosis in a plant or a higher animal. The organism that incorporates the genetic novelty in its genome will generate descendants capable of overcoming the environment in the event that the limiting agent of its survival is present, i.e., selection will be made in its favor. However, organisms that do not carry the resistance gene would not survive in the presence of the harmful agent. Thus, evolution should be understood not so much as a phenomenon of chance, but rather of opportunism, with selection being the determining factor in the success of each genetic variant.

Accordingly, the genome becomes a sensitive indicator of evolutionary events. Among the genetic alternatives that affect biological efficacy, the favorable ones will remain fixed in the genome of their carriers and the less favorable will disappear. Nevertheless, evolution is slow, because the emergence of mutations is slow and limited and because they are not created to solve the problems posed by the environment, but are pre-adaptive and overwhelmingly unfavorable.

Having stated these foundations required to understand evolution in genetic terms, we can analyze the contribution of “comparative genomics”, a branch of genomics that compares the sequences of different organisms in order to deduce their phylogenetic relationships.

Evolution in the pangenome of a prokaryote

A study was recently published in the Proceedings of the National Academy of Sciences in the USA [3] on the evolution of the genome of the bacterium Escherichia coli, which in its original genomic version (completed in 1997) has a circular DNA molecule of 4,639,000 bp, which represents the information of some 4,300 genes.  In addition to the core genome, it may have one or more plasmids, small rings of between 3 and 12 kb of DNA that replicate independently of the core genome. The size of the plasmids can grow considerably due to their modular structure based on sequences that can be integrated and that play an important role in horizontal transfer.

As a result, the genome of bacteria has great interspecific diversity, so that different strains of the same species often show a significant variation in gene content. Maintaining this variation involves both deterministic phenomena, such as natural selection, and stochastic phenomena, such as genetic drift, consisting of a random fluctuation in the presence of genetic variants over the course of generations.

The analysis carried out by the authors of the aforementioned study consisted of comparing the genomes of about 2,300 strains of E. coli, the complete sequence of which is housed in the large genome database of the National Center for Biotechnological Information (NCBI) [4].  The genes present in all the genomes of the collection of strains viewed constitute the “core genome”. In each strain, other genes, considered “accessory genes” are added to this genome. The union of these two sets makes up what has been called the “pangenome”. What the authors are trying to analyze are the coevolutionary relationships between families of genes to reveal to what extent deterministic, i.e. beneficial, or stochastic, i.e. random phenomena have influenced the E. coli pangenome.

To that end, they compared the pangenomes of the different strains in order to investigate whether the presence or absence of specific genes (genetic background) influences the presence or absence of other genes. The content of each contemporary E. coli genome is assumed to be the result of its evolutionary history, in which both vertical and horizontal gene transmission have been involved, and has arisen through a combination of internal (intragenomic) and external (ecological) fitness effects, in addition to possible stochastic, non-adaptive evolution (genetic drift). It is a very interesting type of analysis and only possible at the moment in species such as E. coli, thanks to the large collection of genomes available.

Before we go any further, it should be noted that the analysis carried out, although it focuses on a bacterial species, allows us to address issues related to evolution in more general terms about the evolutionary history of species at a level not hitherto explored, such as that of the genome, which in the words of the British Nobel Prize winner Christian De Duve (1917-2013), is located at the top of the hierarchy of the organization of cells [5].

The authors note the claim of evolutionist Stephen J. Gould, who in his book, “Wonderful Life: The Burgess Shale and the History of Nature” [6] presented an experiment in which the “tape of evolution” could be replayed from any point in history. He suggested that since evolutionary paths depend on unpredictable events, if we could reproduce history, we would not obtain the same outcome each time. Contrary to this assumption, many recent studies have suggested that this view is too rigid.

In the study carried out in E. coli, the gain or loss of genes in the pangenomes was analyzed on the basis of the presence or absence of other genes using ecological nomenclature. In this way, they classified the relationships between two specific genes into three categories to assess whether their link was statistically significant: “mutualistic”, i.e., of apparent mutual benefit; “commensal”, i.e., a gene benefits the presence of a second, but not vice versa; or “competitive”, i.e., mutually exclusive in the same genome. Gene associations were studied using gene presence-absence models applying an automated analysis statistical prediction program known as Random Forest. The study included assessment of the strength of the impact of the presence or absence of each gene analyzed in relation to all others. Without going into further detail, the study conducted allows us to very positively assess the possibilities offered by bioinformatics, which has been developing in line with the collection of genomic sequence data in recent years.

In conclusion, the study found that in the set of 33,138 intergenic relationships analyzed, mutualistic relationships, where the joint presence of a pair of genes in the genomes is significantly greater than expected, occurred in a large majority (20,915); 1,073 were commensal, whereby one gene, usually the least abundant of the pair, generally depends on the other, while the inverse dependence is much weaker or non-existent; and 288 cases were of competitive genes, which avoid each other.

Evidently, what all this demonstrates is the great functional interaction of the genome as a whole, i.e., the importance of a set of genes in the genome compared to the individual contribution of each gene. One might think, in general, that a gene is not an island in the whole genome and that evolution gives more or less opportunity to each genetic novelty to be maintained or eliminated according to the rest of the pre-existing genome. This general conclusion does not in any way modify the traditional theory of evolution which, in any case, is enriched by being able to better explain the consequences of modification phenomena than by horizontal transfer or other parasexual mechanisms that occur in prokaryotes. This is something that could be extended to species with sexual reproduction, although in this case, further research will be required.

The study itself has other basic implications. It allows the impact of different genes to be classified by their greater presence in the different pangenomes analyzed. It has been demonstrated that gene pairs that show a mutualistic relationship are independent of the place (locus) they occupy in the genome. Explanations are derived for cases of commensalism or competition depending on the environment in which the strains develop. There are possible reasons to explain why genes predict the presence or absence of other genes, including functioning in a common pathway or process, redundancy, and shared evolutionary advantage in a new environment.

Finally, the rigidity of the Gould hypothesis was demonstrated. The results suggest that it is likely that rewinding the tape back to the start of E. coli evolution would result in hundreds or thousands of predictable events for each replaying of the tape of evolution.

Potential applications of genomics

This research shows that, even when the evolution is random at the origin of the variation, the role of selection in the implementation of the acquired variants is accentuated. That is, the deterministic model is imposed for the acceptance of genetic variants that are incorporated into the core genome. The presence of some genes mostly supports that of others, which opens up certain expectations for biotechnological applications, for example, to incorporate antibiotic resistance genes in bacterial strains or others of applied interest without the risk of eliminating them. Also of great interest is the expectation in synthetic biology of this type of knowledge for the synthesis of artificial genomes, such as to create transgenic strains for the synthesis of new drugs, proteins, vaccines or other drugs.

Nicolás Jouve

Professor Emeritus of Genetics

Member of the Bioethics Observatory

Former member of the Spanish Bioethics Committee



[1] T. DOBZHANSKY, «Nothing in Biology Makes Sense except in the Light of Evolution». The American Biology Teacher, 35, 973, 125-129.

[2]   F.J. Ayala. “Teleological Explanations in Evolutionary Biology”. Philosophy of Science.37, 1970. 1-15.

[3] A. Beavan, M.R. Domingo-Sananes, J.O. McInerney. «Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome». Proc Natl Acad Sci U S A. 2024 Jan 2;121(1):e2304934120. doi: 10.1073/pnas.2304934120

[4]  E.W. Sayers et al. «Database resources of the national center for biotechnology information». Nucleic Acids Res. 50, D20–D26 (2022).

[5]  Ch. De Duve. La vida en evolución. Crítica, Barcelona, 2004, p. 25.

[6] S.J. Gould. Wonderful Life: The Burgess Shale and the Nature of History (WW Norton ]and Company, 1990).


Subscribe to our newsletter:

We don’t spam! Read our privacy policy for more info.