A note on convex characters, Fibonacci numbers and exponential-time algorithms

Article ID	Journal	Published Year	Pages	File Type
4624447	Advances in Applied Mathematics	2017	13 Pages	PDF

Abstract

Phylogenetic trees are used to model evolution: leaves are labelled to represent contemporary species (“taxa”) and interior vertices represent extinct ancestors. Informally, convex characters are measurements on the contemporary species in which the subset of species (both contemporary and extinct) that share a given state, forms a connected subtree. Given an unrooted, binary phylogenetic tree T on a set of nâ¥2 taxa, a closed (but fairly opaque) expression for the number of convex characters on T has been known since 1992, and this is independent of the exact topology of T. In this note we prove that this number is actually equal to the (2nâ1)th Fibonacci number. Next, we define gk(T) to be the number of convex characters on T in which each state appears on at least k taxa. We show that, somewhat curiously, g2(T) is also independent of the topology of T, and is equal to the (nâ1)th Fibonacci number. As we demonstrate, this topological neutrality subsequently breaks down for kâ¥3. However, we show that for each fixed kâ¥1, gk(T) can be computed in O(n) time and the set of characters thus counted can be efficiently listed and sampled. We use these insights to give a simple but effective exact algorithm for the NP-hard maximum parsimony distance problem that runs in time Î(Ïnân2), where Ïâ1.618... is the golden ratio, and an exact algorithm which computes the tree bisection and reconnection distance (equivalently, a maximum agreement forest) in time Î(Ï2nâpoly(n)), where Ï2â2.619.

Keywords

05C30 05A15 05C85 Fibonacci numbers Algorithms Counting Trees Phylogenetics Convexity