One class random forests

Article ID	Journal	Published Year	Pages	File Type
530954	Pattern Recognition	2013	17 Pages	PDF

Abstract

•We study the behavior of a new method for one-class classification, called One Class Random Forest (OCRF), that we have recently proposed.•The OCRF method is based on a random forest algorithm and an original outlier generation procedure.•OCRF is shown to perform well on various UCI public datasets when compared to state of the art one class methods, and performances are shown to be stable in higher dimension.

One class classification is a binary classification task for which only one class of samples is available for learning. In some preliminary works, we have proposed One Class Random Forests (OCRF), a method based on a random forest algorithm and an original outlier generation procedure that makes use of classifier ensemble randomization principles. In this paper, we propose an extensive study of the behavior of OCRF, that includes experiments on various UCI public datasets and comparison to reference one class namely, Gaussian density models, Parzen estimators, Gaussian mixture models and One Class SVMs—with statistical significance. Our aim is to show that the randomization principles embedded in a random forest algorithm make the outlier generation process more efficient, and allow in particular to break the curse of dimensionality. One Class Random Forests are shown to perform well in comparison to other methods, and in particular to maintain stable performance in higher dimension, while the other algorithms may fail.

Keywords

Outlier detection Random forests Decision trees Ensemble methods One class classification Supervised learning