Article ID Journal Published Year Pages File Type
4605333 Applied and Computational Harmonic Analysis 2012 23 Pages PDF
Abstract

Data clustering is a common technique for data analysis. It is used in many fields including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. The proposed Localized Diffusion Folders (LDF) methodology, whose localized folders are called diffusion folders (DF), introduces consistency criteria for hierarchical folder organization, clustering and classification of high-dimensional datasets. The DF are multi-level data partitioning into local neighborhoods that are generated by several random selections of data points and DF in a diffusion graph and by redefining local diffusion distances between them. This multi-level partitioning defines an improved localized geometry for the data and a localized Markov transition matrix that is used for the next time step in the advancement of the hierarchical diffusion process. The result of this clustering method is a bottom-up hierarchical data organization where each level in the hierarchy contains LDF of DF from the lower levels. This methodology preserves the local neighborhood of each point while eliminating noisy spurious connections between points and areas in the data affinities graph. One of our goals in this paper is to illustrate the impact of the initial affinities selection on data graphs definition and on the robustness of the hierarchical data organization. This process is similar to filter banks selection for signals denoising. The performance of the algorithms is demonstrated on real data and it is compared to existing methods. The proposed solution is generic since it fits a large number of related problems where the source datasets contain high-dimensional data.

Related Topics
Physical Sciences and Engineering Mathematics Analysis