Article ID Journal Published Year Pages File Type
1148396 Journal of Statistical Planning and Inference 2015 8 Pages PDF
Abstract

Phylogenetic trees represent the order and extent of genetic divergence of a fixed collection of organisms. Order of divergence is represented via the tree structure, and extent of divergence by the branch lengths. Both the tree’s structure and branch lengths are unknown parameters and the tree is estimated using sequence information sampled at a number of genetic sites. Under the model of genetic Brownian motion, we prove that as the number of genetic sites that are sampled becomes large, the maximum likelihood estimator of the tree is consistent. (Our maximum likelihood estimator treats each site as an independent data point, which is different from concatenating the sites.) Existing arguments for consistency rely on the assumption of a finite parameter space or only apply to transition probability matrix-based models, and do not hold here due to the continuous model for branch lengths. The metric space of Billera et al. (2001) is central to the proof. We conclude with some comments on the role of parametric methods in tree estimation.

Related Topics
Physical Sciences and Engineering Mathematics Applied Mathematics
Authors
, , ,