Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
486116 | Procedia Computer Science | 2015 | 8 Pages |
Ethylene response factor (ERF) constitutes one of the most important gene families which are related to environmental responses and tolerancein plants . ERF genes are defined by the domain AP2/ERF, which comprises approximately 60 amino acids and are involved in DNA binding. Development of computational tools using machine learning tools will definitely enhance rice genome annotation. Machine learning algorithm involves construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions. This study primarily emphasizes on the development of prediction tool, ERFPred, for drought responsive protein ERF in rice using machine learning algorithms. We have used fourteen different feature extraction methods including amino acid features, dipeptide, tripeptide, hybrid methods and exchange group features. Using, Random Forest classifier, we have obtained a precision rate of 100% for the ERFPred tool. To prove that species specific tool is better than an All plant tool, a general tool for plants, two different approaches were used and validated. The results obtained were also further compared with sequence similarity search tool, PSI-BLAST.