Efficiency investigation of manifold matching for text document classification

Article ID	Journal	Published Year	Pages	File Type
534021	Pattern Recognition Letters	2013	7 Pages	PDF

Abstract

•We investigate a methodology for fusion and inference from multiple disparate data sources.•The methodology separate training data into domain relation learning training data and classifier training data.•Domain relation learning training data and classifier training data can be from completely different domains.•Increasing the domain relation learning training data alone can improve classifier performance significantly.•We present a comparative efficiency investigation of three manifold matching methods for text document classification.

Manifold matching works to identify embeddings of multiple disparate data spaces into the same low-dimensional space, where joint inference can be pursued. It is an enabling methodology for fusion and inference from multiple and massive disparate data sources. In this paper three methods of manifold matching are considered: PoM, which stands for Multidimensional Scaling (MDS) composed with Procrustes; CCA (Canonical Correlation Analysis) and JOFC (Joint Optimization of Fidelity and Commensurability). We present a comparative efficiency investigation of the three methods for a particular text document classification application.

Keywords

CCA Efficiency Classification