Article ID Journal Published Year Pages File Type
531147 Pattern Recognition 2006 14 Pages PDF
Abstract

Currently, Profile Hidden Markov Models (Profile HMMs) are the methodology of choice for probabilistic protein family modeling. Unfortunately, despite substantial progress the general problem of remote homology analysis is still far from being solved. In this article we propose new approaches for robust protein family modeling by consequently exploiting general pattern recognition techniques. A new feature based representation of amino acid sequences serves as the basis for semi-continuous protein family HMMs. Due to this paradigm shift in processing biological sequences the complexity of family models can be reduced substantially resulting in less parameters which need to be trained. This is especially favorable when only little training data is available as in most current tasks of molecular biology research. In various experiments we prove the superior performance of advanced stochastic protein family modeling for remote homology analysis which is especially relevant for e.g. drug discovery applications.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,