Article ID Journal Published Year Pages File Type
2820657 Genomics 2015 6 Pages PDF
Abstract

•Woods performs fast and accurate functional annotation and classification of proteins.•Machine learning (Random Forest) and similarity-based (RAPsearch2) approaches have been used.•It displayed > 96% precision on test and real datasets.•It performed > 87 times faster than BLAST on the real metagenomic datasets.•It is useful for the functional classification and annotation of large genomic and metagenomic datasets.

Functional annotation of the gigantic metagenomic data is one of the major time-consuming and computationally demanding tasks, which is currently a bottleneck for the efficient analysis. The commonly used homology-based methods to functionally annotate and classify proteins are extremely slow. Therefore, to achieve faster and accurate functional annotation, we have developed an orthology-based functional classifier ‘Woods’ by using a combination of machine learning and similarity-based approaches. Woods displayed a precision of 98.79% on independent genomic dataset, 96.66% on simulated metagenomic dataset and > 97% on two real metagenomic datasets. In addition, it performed > 87 times faster than BLAST on the two real metagenomic datasets. Woods can be used as a highly efficient and accurate classifier with high-throughput capability which facilitates its usability on large metagenomic datasets.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics
Authors
, , , , ,