Pattern selection approaches for the logical analysis of data considering the outliers and the coverage of a pattern

Article ID	Journal	Published Year	Pages	File Type
385387	Expert Systems with Applications	2011	6 Pages	PDF

Abstract

The logical analysis of data (LAD) is one of the most promising data mining methods developed to date for extracting knowledge from data. The key feature of the LAD is the capability of detecting hidden patterns in the data. Because patterns are basically combinations of certain attributes, they can be used to build a decision boundary for classification in the LAD by providing important information to distinguish observations in one class from those in the other. The use of patterns may result in a more stable performance in terms of being able to classify both positive and negative classes due to their robustness to measurement errors.The LAD technique, however, tends to choose too many patterns by solving a set covering problem to build a classifier; this is especially the case when outliers exist in the data set. In the set covering problem of the LAD, each observation should be covered by at least one pattern, even though the observation is an outlier. Thus, existing approaches tend to select too many patterns to cover these outliers, resulting in the problem of overfitting. Here, we propose new pattern selection approaches for LAD that take both outliers and the coverage of a pattern into account. The proposed approaches can avoid the problem of overfitting by building a sparse classifier. The performances of the proposed pattern selection approaches are compared with existing LAD approaches using several public data sets. The computational results show that the sparse classifiers built on the patterns selected by the proposed new approaches yield an improved classification performance compared to the existing approaches, especially when outliers exist in the data set.

► LAD patterns can be used to build decision boundaries for classification. ► The LAD pattern selection procedure tends to choose too many patterns when outliers exist in a data set. ► A new pattern selection procedure is developed considering both outliers and the coverage of a pattern. ► Computational results indicate that the new procedure can be used to avoid overfitting and construct a sparse, accurate classifier.

Keywords

Pattern selection Logical analysis of data Classification Set covering problem