Article ID Journal Published Year Pages File Type
10140719 Chemometrics and Intelligent Laboratory Systems 2018 36 Pages PDF
Abstract
DNA-binding proteins play a crucial role in various biological processes such as regulation of DNA modification, repair, replication, and transcription. These proteins widely participate in the production of drugs, antibiotics, and steroids. Many computational approaches have been developed to identify DNA-binding proteins, but some methods are time-consuming and expensive while some are laborious. Still, it is a challenging task for the researchers to develop highly promising computational methods to identify DNA-binding proteins with high precision. In our work, we developed a new computational approach named as DBPPred-PDSD which has more promising prediction power for DNA-binding proteins. We employed two datasets, extracted features via Split Amino Acid Composition (SAAC) and Position Specific Scoring Matrix (PSSM). Further, we applied the Discrete Wavelet Transform (DWT) on PSSM to extract dominant features. From these features space, optimal features are generated by Maximum Relevance and Minimum Redundancy (mRMR) and fused. To obtain highly informative features, we used Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and provided to well-known classifiers namely Support Vector Machine (SVM) and Random Forest (RF). Our model with the SVM classifier on three tests i.e. Jackknife cross-validation, 10-fold cross-validation and Independent tests achieved the highest success rate than other existing methods in the literature.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , , , , , ,