Article ID Journal Published Year Pages File Type
533205 Pattern Recognition 2016 17 Pages PDF
Abstract

•We study the extraction of a target population from a dataset contaminated by outliers.•To this end, we propose a new Fisher type contrast measure.•We reconsider this problem from the formalism of proximal support vector machines.•An approximation of the contrast measure is done using a conjugate gradient method.•No matrix inversion is needed which lowers the computational complexity.

Recently in Dufrenois [1], a new Fisher type contrast measure has been proposed to extract a target population in a dataset contaminated by outliers. Although mathematically sound, this work presents some further shortcomings in both the formalism and the field of use. First, we propose to re-express this problem from the formalism of proximal support vector machines as introduced in Mangasarian and Wild [2]. This change is far from harmless since it introduces a suited writing for solving the problem. Another limiting factor of the method is that its performance relies on the assumption that the density between the target and outliers are different. This consideration can easily prove to be over-optimistic for real world datasets making the method unreliable, at least directly. The computation of the decision boundary is a time consuming part of the algorithm since it is based on solving a generalized eigenvalue problem (GEP). This method is therefore limited to medium sized data sets. In this paper, we propose appropriate strategies to unlock all these shortcomings and fully benefit from the interest of the approach. Firstly, we show under some conditions that generating appropriate artificial outliers allows to stay within the constraints of the method and thus enlarges the conditions of use. Secondly, we show that the GEP can be advantageously replaced by a conjugate gradient solution (CG) significantly decreasing the computational cost. Lastly, the proposed algorithm is compared with recent novelty detectors on synthetic and real datasets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, ,