A method of data mining using Hidden Markov Models (HMMs) for protein secondary structure prediction

Article ID	Journal	Published Year	Pages	File Type
6900420	Procedia Computer Science	2018	10 Pages	PDF

Abstract

The prediction of the secondary structure of proteins is one of the most studied problems in computational biology. However, the accuracy of the predicted secondary structure is insufficient for practical utility. In this paper, we propose an algorithmic approach based on Hidden Markov Models (HMM) to model the problem of prediction. Therefore, HMM are often used for data mining in bioinformatics. In this research, we have built a HMM that models the prediction problem of protein secondary structure. Moreover, two procedures for estimating the probability parameters were performed by the Maximum Likelihood Estimation (MLE) of protein sequences from a public database (Brookhaven PDB). Finally, a new prediction approach based on a posteriori probability of hidden regimes has been implemented. Our model appears to be very efficient on single sequences, with a score of 66.6% by comparing the first results obtained with the real secondary sequence and encouraging for an improvement of the system.

Keywords

Posterior probability Bioinformatics Data mining Computational biology Hidden Markov models Supervised learning Prediction