Very Fast Interactive Visualization of Large Sets of High-dimensional Data

Article ID	Journal	Published Year	Pages	File Type
485957	Procedia Computer Science	2015	10 Pages	PDF

Abstract

The embedding of high-dimensional data into 2D/3D space is the most popular way of data visualization. Despite recent advances in developing of very accurate dimensionality reduction algorithms, such as BH-SNE, Q-SNE and LoCH, their relatively high computational complex- ity still remains the obstacle for interactive visualization of truly large datasets consisting of M <106+ of high-dimensional N < 103+ feature vectors. We show that a new clone of the multidimensional scaling (MDS)–nr-MDS–can be up to two orders of magnitude faster than the modern dimensionality reduction algorithms. We postulate its linear O(M) computational and memory complexities. Simultaneously, our method preserves in 2D/3D target spaces high separability of data, similar to that obtained by the state-of-the-art dimensionality reduction algorithms. We present the effects of nr-MDS application in visualization of data repositories such as 20 Newsgroups (M = 1.8 ·104), MNIST (M = 7·104) and REUTERS (M = 2.67·105).