DegPack: A web package using a non-parametric and information theoretic algorithm to identify differentially expressed genes in multiclass RNA-seq samples

Article ID	Journal	Published Year	Pages	File Type
10825647	Methods	2014	9 Pages	PDF

Abstract

Gene expression in the whole cell can be routinely measured by microarray technologies or recently by using sequencing technologies. Using these technologies, identifying differentially expressed genes (DEGs) among multiple phenotypes is the very first step to understand difference between phenotypes. Thus many methods for detecting DEGs between two groups have been developed. For example, T-test and relative entropy are used for detecting difference between two probability distributions. When more than two phenotypes are considered, these methods are not applicable and other methods such as ANOVA F-test and Kruskal-Wallis are used for finding DEGs in the multiclass data. However, ANOVA F-test assumes a normal distribution and it is not designed to identify DEGs where genes are expressed distinctively in each of phenotypes. Kruskal-Wallis method, a non-parametric method, is more robust but sensitive to outliers. In this paper, we propose a non-parametric and information theoretical approach for identifying DEGs. Our method identified DEGs effectively and it is shown less sensitive to outliers in two data sets: a three-class drought resistant rice data set and a three-class breast cancer data set. In extensive experiments with simulated and real data, our method was shown to outperform existing tools in terms of accuracy of characterizing phenotypes using DEGs. A web service is implemented at http://biohealth.snu.ac.kr/software/degpack for the analysis of multi-class data and it includes SAMseq and PoissonSeq methods in addition to the method described in this paper.

Keywords

RNA-seq Non-parametric algorithm Multiclass Differentially expressed genes