Article ID Journal Published Year Pages File Type
490477 Procedia Computer Science 2013 10 Pages PDF
Abstract

Blind choice and parameterization of data mining tools often yield vague or completely misleading results. Interactive visualization enables not only extensive exploration of data but also better matching of clustering/classification schemes to the type of data being analyzed. The multidimensional scaling (MDS), which employs particle dynamics to the error function minimization, is a good candidate to be a computational engine for interactive data mining. However, the main disadvantage of MDS is both its memory and time complexity. We developed novel SUBSET algorithm of a lower complexity, which is competitive to the best, currently used, MDS algorithms in terms of efficiency and accuracy. SUBSET employs reduced dissimilarity matrix, which structure allows for efficient usage of both multi-core CPU and SIMD GPU processor architectures. Consequently, SUBSET enables visualization of datasets consisting of an order of 105 data items on a standard personal computer or laptop. We compare a few strategies of dissimilarity matrix reduction and we present typical timings obtained by respective MDS algorithms on selected multithread CPU and GPU architectures.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)