Article ID Journal Published Year Pages File Type
4605056 Applied and Computational Harmonic Analysis 2015 21 Pages PDF
Abstract

Diffusion Maps (DM), and other kernel methods, are utilized for the analysis of high dimensional datasets. The DM method uses a Markovian diffusion process to model and analyze data. A spectral analysis of the DM kernel yields a map of the data into a low dimensional space, where Euclidean distances between the mapped data points represent the diffusion distances between the corresponding high dimensional data points. Many machine learning methods, which are based on the Euclidean metric, can be applied to the mapped data points in order to take advantage of the diffusion relations between them. However, a significant drawback of the DM is the need to apply spectral decomposition to a kernel matrix, which becomes infeasible for large datasets.In this paper, we present an efficient approximation of the DM embedding. The presented approximation algorithm produces a dictionary of data points by identifying a small set of informative representatives. Then, based on this dictionary, the entire dataset is efficiently embedded into a low dimensional space. The Euclidean distances in the resulting embedded space approximate the diffusion distances. The properties of the presented embedding and its relation to DM method are analyzed and demonstrated.

Related Topics
Physical Sciences and Engineering Mathematics Analysis
Authors
, , , ,