Improving self-organization of document collections by semantic mapping

Article ID	Journal	Published Year	Pages	File Type
410883	Neurocomputing	2006	8 Pages	PDF

Abstract

In text management tasks, the dimensionality reduction becomes necessary to computation and interpretability of the results generated by machine learning algorithms. This paper describes a feature extraction method called semantic mapping. Semantic mapping, sparse random mapping and PCA are applied to self-organization of document collections using self-organizing map (SOM). The behaviors of the methods on projection of binary and tfidf document vector representations are compared. The classification error generated by SOM maps on text categorization of the K1 collection was used to compare the performance of the methods. Semantic mapping generated better document representation than sparse random mapping.

Keywords

Semantic mapping Self-organizing map Dimensionality reduction