Adaptive clustering for time series: Application for identifying cell cycle expressed genes

Article ID	Journal	Published Year	Pages	File Type
417862	Computational Statistics & Data Analysis	2009	13 Pages	PDF

Abstract

The biological problem of identifying the active genes during the cell division process is addressed. The cell division ensures the proliferation of cells, which is drastically aberrant in cancer cells. The studied genes are described by their expression profiles during the cell division cycle. Commonly, the identification process is a supervised approach based on an a priori set of reference genes, assumed as well-characterizing the cell cycle phases. Each studied gene is then classified by its peak similarity to one pre-specified reference gene. This classical approach suffers from two limitations. On the one hand, there is no consensus between biologists about the set of reference genes to consider for the identification process. On the other hand, the proximity measures used for genes expression profiles are unjustified and mainly based on the expression values regardless of the genes expression behavior. To identify genes expression profiles, a new adaptive clustering approach is proposed which consists of two main points. First, it allows in an unsupervised way the selection of a well-justified set of reference genes, to be compared with the pre-specified ones. Secondly, it enables the users to learn the appropriate proximity measure to use for genes expression data, a measure which will cover both proximity on values and on behavior. The adaptive clustering method is compared to a correlation-based approach through public and simulated genes expression data.