Article ID Journal Published Year Pages File Type
4947655 Neurocomputing 2017 21 Pages PDF
Abstract
A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,