Information entropy based sample reduction for support vector data description

Article ID	Journal	Published Year	Pages	File Type
11002698	Applied Soft Computing	2018	20 Pages	PDF

Abstract

Support vector data description (SVDD) is one of the most attractive methods in one-class classification (OCC), especially in solving problems in novelty detection. SVDD helps to deal with the classification with a large amount of target data and few outlier data. However, the huge computational complexity in kernel mapping makes it hard to be applied in use, as the number of target data increases. In order to reduce the size of the training data samples, we introduce a method called information entropy based sample reduction for support vector data description (IESRSVDD). In this method, the information entropy is calculated for the distribution of each data sample. The distance between each two samples is utilized to evaluate the probability of uncertainty for each sample. The samples with higher entropy values are considered to be near the boundary of the data distribution in kernel space, and likely to become support vectors. All samples with their entropy values lower than a threshold are excluded. An updated objective function of conventional SVDD is used in this method for sample reduction. The innovative highlights of the proposed IESRSVDD are: (i) reducing the training samples based on information entropy, (ii) introducing the sample reduction to SVDD in order to speed up the training process, and (iii) having the feasibility and effectiveness of IESRSVDD validated and analyzed. The experiment results show the proposed method can achieve a faster training speed by reducing the scale of the training set. The computing time is significantly reduced by 50-75% and the accuracy in classification is improved.

Keywords

Sample reduction Information entropy Support vector data description One-class classification