Local Partial Least Square classifier in high dimensionality classification

Article ID	Journal	Published Year	Pages	File Type
4947655	Neurocomputing	2017	21 Pages	PDF

Abstract

A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.

Keywords

Distance function