Mathematical phylogenetics

Developing the mathematical framework and algorithms to untangle the web of life is the central focus of this group.

Many popular software packages like (e.g. BEAST and Dendroscope) that reconstruct phylogenetic trees and networks have their foundations in deep mathematical results. In mathematical phylogenetics, we develop new mathematical tools and algorithms to solve problems related to the reconstruction and analysis of phylogenetic trees and networks.

Such networks are used to represent ancestral relationships of living entities and they have applications in evolutionary biology, linguistics, cancer research, and epidemiology.  While, phylogenetic trees are traditionally used to represent ancestral relationships between organisms, recent investigations into horizontal gene transfer and hybridization, which are processes that result in mosaic patterns of relationships, challenge the model of a phylogenetic tree.

It is now widely acknowledged that graphs with cycles, called phylogenetic networks, are better suited to represent evolutionary histories because they provide a more accurate picture of the relationships between organisms. From a mathematical perspective, phylogenetic networks pose many challenging questions since they are much more entangled than trees and, consequently, make the underlying problems more complicated to address.

Projects in this research area include: 

Calculating the minimum hybridization number under time constraints

fig4 (1)

In the reconstruction of phylogenetic networks from a set of trees, the so-called minimum hybridization number is a popular measure to quantify the extent to which hybridization or horizontal gene transfer has had an impact on the evolutionary history of a set of species.

Computing this minimum number for two phylogenetic trees is an NP-hard optimization problem and a lot of effort has been put into the development of clever characterizations and algorithms for this two-tree problem. However, no characterization is known for a collection of more than two trees.

We considered a variation of the problem that imposes two biologically motivated time-constraints on hybridization and speciation events. While the problem remains NP-hard, we were able to give the first characterization for calculating the minimum hybridization number under these constraints for when the collection of trees is arbitrarily large. The characterization is in terms of cherries (two leaves in a tree that have a common parent) and the existence of a particular type of sequence.

Paper: P. J. Humphries, S. Linz, and C. Semple (2013). Cherry picking: a characterization of the temporal hybridization number for a set of phylogenies. Bulletin of Mathematical Biology, 75:1879-1890.

Researcher: Dr Simone Linz


The study of sequence diversity under phylogenetic models is a modern "classic"

heled_sequence diversity

Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent. In a paper published in Theoretical Population Biology I revisit this topic under the multispecies coalescent, an extension of the single population model to multiple populations. 

I derived exact formulas for the sequence dissimilarity of two sequences drawn at random under a basic three parameters multispecies setup:  the species tree birth rate under the pure birth process (Yule), the species effective population size and the mutation rate. I also discuss the effects of relaxing some of the model assumptions.

Paper: Heled, J. (2012) Sequence diversity under the multispecies coalescent with yule process and constant population size. Theoretical population biology, 81(2):97–101

Researcher: Dr Joseph Heled


Computational and geometric aspects of phylogenetic time-trees


It has recently become apparent that Bayesian phylogenetic tree inference methods have fundamental limitations with respect to the data size that can be handled using those methods. These limitations are due to the complexity of tree space. Phylogenetic tree space possess unique geometric properties, in the sense that the geometry of the space is different in many aspect from classical and well-studied geometries. New theory is needed to understand the structure of tree space, to properly estimate the complexity of phylogenetic algorithms, and to develop efficient and at the same time statistically consistent inference methods capable of handling contemporary genomic data sets.

We have developed an efficient approach for statistical analysis of finite sets of phylogenetic time-trees with contemporaneous taxa. This approach uses a novel geometry and algorithms to efficiently estimate various statistics of a posterior sample from the space of ultrametric trees. Furthermore, this work led to the notion of a discrete time-tree. In joint work with Chris Widden and Erick Matsen, we explored a construction of the space of discrete time-trees analogous to the classical NNI space on phylogenetic gene trees. The geometry of this space is (surprisingly) fundamentally different from that of NNI. Our goal now is to find the complexity of the discrete time-tree space. Depending on this complexity, the result might change the way people do Bayesian inference of phylogenetic time-trees.

Researchers: Dr Alex Gavruskin, Prof Alexei Drummond