DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space

Article ID	Journal	Published Year	Pages	File Type
10140719	Chemometrics and Intelligent Laboratory Systems	2018	36 Pages	PDF

Abstract

DNA-binding proteins play a crucial role in various biological processes such as regulation of DNA modification, repair, replication, and transcription. These proteins widely participate in the production of drugs, antibiotics, and steroids. Many computational approaches have been developed to identify DNA-binding proteins, but some methods are time-consuming and expensive while some are laborious. Still, it is a challenging task for the researchers to develop highly promising computational methods to identify DNA-binding proteins with high precision. In our work, we developed a new computational approach named as DBPPred-PDSD which has more promising prediction power for DNA-binding proteins. We employed two datasets, extracted features via Split Amino Acid Composition (SAAC) and Position Specific Scoring Matrix (PSSM). Further, we applied the Discrete Wavelet Transform (DWT) on PSSM to extract dominant features. From these features space, optimal features are generated by Maximum Relevance and Minimum Redundancy (mRMR) and fused. To obtain highly informative features, we used Support Vector Machine-Recursive Feature Elimination (SVM-RFE) and provided to well-known classifiers namely Support Vector Machine (SVM) and Random Forest (RF). Our model with the SVM classifier on three tests i.e. Jackknife cross-validation, 10-fold cross-validation and Independent tests achieved the highest success rate than other existing methods in the literature.

Keywords

Split amino acid composition Discrete wavelet transform Random forest Position specific scoring matrix Support vector machine