Article ID Journal Published Year Pages File Type
4944163 Information Sciences 2017 29 Pages PDF
Abstract
Multiple Sequence Alignment (MSA) is the simultaneous alignment among three or more biological sequences (nucleotides or amino acids). In recent years, important efforts have been assigned to the development of MSA approaches. In this work, we propose a framework that extracts the biological characteristics of an input set of unaligned sequences and uses this knowledge to decide which is the most suitable aligner and parameter configuration. We refer to it as Multiple Aligner Framework (MAF). The selection of the tuple {Aligner, Configuration} is based on searching, in a pre-computed file, the best tuple for a dataset with similar biological characteristics. In order to create this file, we use multiobjective optimization. In fact, three well-known multiobjective evolutionary algorithms (NSGA-II, IBEA and MOEA/D) have been used. To validate the framework, we have used five popular benchmark suites: BAliBASE 3.0, PREFAB 4.0, SABmark 1.65, OX-Bench and CDD 3.14. After comparing with well-known aligners published in the literature, such as Kalign2, MUSCLE, MAFFT, T-Coffee, MSAProbs, ProbCons, Clustal Ω and MUMMALS, we conclude that the multiple aligner framework is, in average, the method with the best balance between alignment accuracy/conservation and required runtime.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,