A Variable Selection Method Considering Cluster Loading for Labeled High Dimension Low Sample Size Data

Article ID	Journal	Published Year	Pages	File Type
489615	Procedia Computer Science	2015	10 Pages	PDF

Abstract

As the information society rapidly develops, there is an increased importance placed of dealing with high dimension low sample size (HDLSS) data, whose number of variables is much larger than the number of objects. Moreover, the selection of effective variables for HDLSS data is becoming more crucial. In this paper, a variable selection method considering cluster loading for labeled HDLSS data is proposed. Related to cluster loading, the conventional model considering principal component analysis has been proposed. However, the model can not be used for HDLSS data. Therefore, we propose a cluster loading model using a clustering result. By using the obtained cluster loading, we can select variables which belong to clusters unrelated with the given discrimination information represented by the labels of objects. Several numerical examples show a better performance of the proposed method.