Article ID Journal Published Year Pages File Type
469109 Computer Methods and Programs in Biomedicine 2016 11 Pages PDF
Abstract

•“iSS-Hyb-mRMR” model is proposed for identification of splicing sites.•Trinucleotide and tetranucleotide composition are used as feature extraction schemes.•Hybrid space is formed by using TNC and TetraNC spaces.•Various classification algorithms are analyzed.•mRMR is utilized to reduce feature space.

Background and objectivesGene splicing is a vital source of protein diversity. Perfectly eradication of introns and joining exons is the prominent task in eukaryotic gene expression, as exons are usually interrupted by introns. Identification of splicing sites through experimental techniques is complicated and time-consuming task. With the avalanche of genome sequences generated in the post genomic age, it remains a complicated and challenging task to develop an automatic, robust and reliable computational method for fast and effective identification of splicing sites.MethodsIn this study, a hybrid model “iSS-Hyb-mRMR” is proposed for quickly and accurately identification of splicing sites. Two sample representation methods namely; pseudo trinucleotide composition (PseTNC) and pseudo tetranucleotide composition (PseTetraNC) were used to extract numerical descriptors from DNA sequences. Hybrid model was developed by concatenating PseTNC and PseTetraNC. In order to select high discriminative features, minimum redundancy maximum relevance algorithm was applied on the hybrid feature space. The performance of these feature representation methods was tested using various classification algorithms including K-nearest neighbor, probabilistic neural network, general regression neural network, and fitting network. Jackknife test was used for evaluation of its performance on two benchmark datasets S1 and S2, respectively.ResultsThe predictor, proposed in the current study achieved an accuracy of 93.26%, sensitivity of 88.77%, and specificity of 97.78% for S1, and the accuracy of 94.12%, sensitivity of 87.14%, and specificity of 98.64% for S2, respectively.ConclusionIt is observed, that the performance of proposed model is higher than the existing methods in the literature so for; and will be fruitful in the mechanism of RNA splicing, and other research academia.

Keywords
Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,