Outlining our unique tree of life
Charles Darwin established the foundations of modern evolutionary biology with two fundamental concepts — all species are related to one another through a common ancestor, and natural selection reflects the interplay between hereditary information (in modern terms, genes) and the environment in which species evolve. The pathways of descent of species from a common ancestor is traditionally depicted as a phylogenetic tree. Similarly, the histories of genes can be depicted as trees, but these can significantly differ from the history of species because genes are affected by a variety of evolutionary events such as duplication, loss, or lateral transfer. Scientists set out to develop phylogenetic methods to reconstruct multiple gene trees in the context of a species tree with EU funding of the GENEFOREST project. Their goal was to deliver models applicable to very large datasets through large-scale reconstructions of genomic processes such as gene duplication, transfer and loss (DTL). Such methods, while computationally intensive, enable the study of complete genomes rather than a handful of genes, and hence to fully reconstruct the history of these genomes. Interestingly, these methods also provide information on the timing of species diversification, even in the absence of fossil data. As a proof of concept, the probabilistic model named ODT (for Origination, Duplication, Transfer and loss of genes) was used to the reconstruct the dated phylogeny of 36 cyanobacterial species using more than 8 000 gene families. Scientists extended their ODT model to derive the first model of gene acquisition and loss along extinct or unsampled lineages (exODT). This extension promise to allow the exploration of the enormous diversity of life that have gone extinct, but may have contributed to extent genomes through ancient lateral gene transfers. Scientists also developed the first probabilistic method to simultaneously determine the species tree and all the gene trees that together make up the history of genomes, thereby significantly improving the quality of both types of trees. This program, called PHYLDOG, was used to reconstruct the evolutionary history of 36 mammalian genomes. In the final step, investigators combined the exODT model with other probabilistic models to obtain the approximate likelihood estimation (ALE) model. ALE can infer a gene tree with remarkable accuracy from a given species tree and is capable of accommodating up to 100 genomes. The comprehensive large-scale evolutionary models of genome evolution developed in the GENEFOREST project will have major impact on the study of phylogenetic species trees, gene trees and their interrelationship.