Robust discriminant analysis and its application to identify protein coding regions of rice genes

Article ID	Journal	Published Year	Pages	File Type
4500309	Mathematical Biosciences	2011	5 Pages	PDF

Abstract

Identification of protein coding regions is fundamentally a statistical pattern recognition problem. Discriminant analysis is a statistical technique for classifying a set of observations into predefined classes and it is useful to solve such problems. It is well known that outliers are present in virtually every data set in any application domain, and classical discriminant analysis methods (including linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)) do not work well if the data set has outliers. In order to overcome the difficulty, the robust statistical method is used in this paper. We choose four different coding characters as discriminant variables and an approving result is presented by the method of robust discriminant analysis.

► The problem of identification of protein coding regions is considered by means of robust discriminant method. ► The accuracy of robust discriminant methods is better than that of codon usage method. ► The robust discriminant rules are better than the classical discriminant rules. ► Robust quadratic discriminant method is recommended when identifying protein coding regions of rice genes.

Keywords

Identification