Extending parallelization of the self-organizing map by combining data and network partitioned methods

Article ID	Journal	Published Year	Pages	File Type
567957	Advances in Engineering Software	2015	7 Pages	PDF

Abstract

•Data partitioning and network partitioning parallelization methods are combined.•Large performance gains are found on small networks.•The network is parallelized to the level of a single node dimension.•Parallelization is maximized to allow robustness for future hardware.

High-dimensional data is pervasive in many fields such as engineering, geospatial, and medical. It is a constant challenge to build tools that help people in these fields understand the underlying complexities of their data. Many techniques perform dimensionality reduction or other “compression” to show views of data in either two or three dimensions, leaving the data analyst to infer relationships with remaining independent and dependent variables. Contextual self-organizing maps offer a way to represent and interact with all dimensions of a data set simultaneously. However, computational times needed to generate these representations limit their feasibility to realistic industry settings. Batch self-organizing maps provide a data-independent method that allows the training process to be parallelized and therefore sped up, saving time and money involved in processing data prior to analysis. This research parallelizes the batch self-organizing map by combining network partitioning and data partitioning methods with CUDA on the graphical processing unit to achieve significant training time reductions. Reductions in training times of up to twenty-five times were found while using map sizes where other implementations have shown weakness. The reduced training times open up the contextual self-organizing map as viable option for engineering data visualization.

Keywords

Data visualization High-dimensional data Parallel computing Neural network Self-organizing map GPU