Tanglegrams can be misleading
Tanglegrams are a visual method to compare two phylogenetic trees with the same set of tip labels. This can be useful for comparing trees produced by different methods on the same alignment, or on different alignments of the sample set. Tanglegrams work by connecting the matching tips of the trees, then rotating subtrees to minimise the number of crossings. The algorithm was published in 2011, and continues to be used in a range of publications (for example genomic epidemiology).
I recently ran into an issue using tanglegrams which I’d like to document here. I constructed phylogenies from the protein alignment of all the alleles of PspA identified in a carriage collection of Streptococcus pneumoniae using RAxML (a maximum likelihood method) and mrbayes (a Bayesian method which produces a posterior of trees). For the latter, one can use the R package treescape to find the median tree in the posterior distribution. I wanted to compare the shape of these two trees to see if there were any obvious areas of uncertainty in the topology (a perhaps more quantitative way to do this might be by looking at bootstrap values). I made a tanglegram, from which my initial interpretation was trees were very similar:
There is only one clade of swaps (near the bottom), so the trees must be similar? Not necessarily. A game I like to play is manually rotating subtrees to make metadata cluster. Though the topology remains identical, visually the clustering of metadata can change drastically. Even though we know vertical distance is meaningless in a phylogeny, the way this is usually visually displayed can lead us to draw erroneous conclusions.
Let’s look at the tanglegram above again in more detail:
The swap highlighted in blue is the only event where tip lines cross. This correctly leads us to identify that pspA-31 and pspA-37 are in slightly different places in this part of the phylogeny. However look in the red box. Those these strains totally align, the placement of them within the phylogeny is quite different (red arrows)! In this case I don’t think the line crossings by the tanglegram have identified what I wanted to see here, which is that the ancestral topology of these two trees is different.
While I am not claiming tanglegrams are not useful, without careful attention they may be misleading. A good alternative to tanglegrams, in my opinion, is phylo.io. This allows you to dynamically compare two tree topologies, and see how each branch flip affects the arrangement rather than providing just one static comparison. If you want to try this out on the trees above the two newick files are attached at the end of this post to copy in. Finally, I would also note that there are various metrics which can provide a quantitative comparison between (many) trees, my favourite of which is the Kendall-Colijin metric implemented in treescape.
Update: Michelle Kendall (an author of both the KC metric and treescape) pointed out on twitter that they have added the function plotTreeDiff() as another way to look at topology changes. I gave this a go:
From the manual: ‘A plot of the two trees side by side. Tips are coloured in the following way:
- if each ancestor of a tip in tree 1 occurs in tree 2 with the same partition of tip descendants, then the tip is coloured grey (or supplied “baseCol”)
- if not, the tip gets coloured pale orange to red on a scale according to how many differences there are amongst its most recent common ancestors with other tips. The colour spectrum can be changed according to preference.’
You can use tipDiff() to see the numeric results direcly.