Parallelized phylogenetic post-analysis on multi-core architectures

Article ID	Journal	Published Year	Pages	File Type
430451	Journal of Computational Science	2010	8 Pages	PDF

Abstract

Bioinformatics is experiencing a rapid and overwhelming accumulation of molecular sequence data, predominantly driven by novel wet-lab sequencing techniques. This trend poses scalability challenges for tool developers. In the field of phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data), scalability is becoming an increasingly important issue for operations other than the tree reconstruction itself. In this paper we focus on post-analysis tasks in reconstructing very large trees, specifically the step of building (extended) majority rule consensus trees from a collection of equally plausible trees or a collection of bootstrap replicate trees. To this end we present non-parallel optimizations which establish our implementation as the fastest exact implementation in phylogenetics, and our novel parallelized routines are the first of their kind. Our non-parallel optimizations achieve a performance improvement of factor 50 compared to the previous version of our code and we achieve a maximum speedup of 5.5 on a 8-core Nehalem node for building consensus trees comprising up to 55,000 organisms. We also present a parallel approach for drawing bootstrap support values on a candidate tree, and experimentally assess our approach in order to better understand read-only versus read–write parallel hash table accesses on multi-core systems.

Keywords

RAxML Phylogenetics Multi-core architecture