Article ID Journal Published Year Pages File Type
7556894 Analytical Biochemistry 2018 8 Pages PDF
Abstract
DNase I hypersensitive sites (DHSs) are accessible chromatin zones hypersensitive to DNase I endonucleases in plant genome. DHSs have been used as markers for the presence of transcriptional regulatory elements. It is an important complement to develop computational methods to identify DHSs for discovering potential regulatory elements. To the best of our knowledge, several machine learning approaches have been proposed for the DHSs prediction, but there is still room for improvements. In this work, a new predictor called pDHS-WE was proposed for prediction of DHSs in plant genome by using weighted ensemble learning framework. Here, five classes of heterogeneous features were used to represent the sequences. Five random forest (RF) operators were constructed based on these five classes of features. The proposed pDHS-WE was formed by fusing the five individual RF classifiers into an ensemble predictor. Genetic algorithm was employed to obtain the weights of different classes of features. In the experiments, pDHS-WE obtained accuracy of 88.5%, sensitivity of 89.1%, specificity of 88.0%, and AUC of 0.958, which was more than 2.7%, 2%, 3.5% and 2.6% higher than state-of-the-art methods, respectively. The results suggested that pDHS-WE may become a useful tool for transcriptional regulatory elements analysis in plant genome.
Related Topics
Physical Sciences and Engineering Chemistry Analytical Chemistry
Authors
, , ,