Article ID Journal Published Year Pages File Type
5760381 Mathematical Biosciences 2017 20 Pages PDF
Abstract
Informative phylogenetic analysis is dependent on the presence of curated and annotated sequences. This may be complemented by the simultaneous availability of empirical data pertaining to their in vivo function. Confounding sequences, with their similarity to more than one functional cluster, can therefore, render any categorization ambiguous, subjective, and imprecise. Here, I analyze and discuss the development of a mathematical expression that can characterize a potential confounding protein sequence. Specifically, statistical descriptors of combinatorially arranged profile HMM scores are computed and evaluated. The resultant data is then incorporated into an index of sequence suitability. The sequence may then be recommended as either suitable for inclusion or be excluded all together. The index is independent of experimental data and, can, be computed from the primary structure of the protein sequence. This can be utilized to trim previously grouped sequences and can either finalize the composition of training set or reduce the search space of sequences to be tested.
Related Topics
Life Sciences Agricultural and Biological Sciences Agricultural and Biological Sciences (General)
Authors
,