Article ID Journal Published Year Pages File Type
430451 Journal of Computational Science 2010 8 Pages PDF
Abstract

Bioinformatics is experiencing a rapid and overwhelming accumulation of molecular sequence data, predominantly driven by novel wet-lab sequencing techniques. This trend poses scalability challenges for tool developers. In the field of phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data), scalability is becoming an increasingly important issue for operations other than the tree reconstruction itself. In this paper we focus on post-analysis tasks in reconstructing very large trees, specifically the step of building (extended) majority rule consensus trees from a collection of equally plausible trees or a collection of bootstrap replicate trees. To this end we present non-parallel optimizations which establish our implementation as the fastest exact implementation in phylogenetics, and our novel parallelized routines are the first of their kind. Our non-parallel optimizations achieve a performance improvement of factor 50 compared to the previous version of our code and we achieve a maximum speedup of 5.5 on a 8-core Nehalem node for building consensus trees comprising up to 55,000 organisms. We also present a parallel approach for drawing bootstrap support values on a candidate tree, and experimentally assess our approach in order to better understand read-only versus read–write parallel hash table accesses on multi-core systems.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , ,