Article ID Journal Published Year Pages File Type
495657 Applied Soft Computing 2014 10 Pages PDF
Abstract

•Ensemble of classifiers integrating prior biological knowledge.•Relevant gene sets as an informed feature selection technique.•Biological plausible interpretation of ensemble architecture.•Publicly available datasets covering biclass and multiclass scenarios.•Model comparison with classical approaches and standard ensemble alternatives.

Since the introduction of DNA microarray technology, there has been an increasing interest on clinical application for cancer diagnosis. However, in order to effectively translate the advances in the field of microarray-based classification into the clinic area, there are still some problems related with both model performance and biological interpretability of the results. In this paper, a novel ensemble model is proposed able to integrate prior knowledge in the form of gene sets into the whole microarray classification process. Each gene set is used as an informed feature selection subset to train several base classifiers in order to estimate their accuracy. This information is later used for selecting those classifiers comprising the final ensemble model. The internal architecture of the proposed ensemble allows the replacement of both base classifiers and the heuristics employed to carry out classifier fusion, thereby achieving a high level of flexibility and making it possible to configure and adapt the model to different contexts. Experimental results using different datasets and several gene sets show that the proposal is able to outperform classical alternatives by using existing prior knowledge adapted from publicly available databases.

Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slide

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , , ,