Efficient multi-modal geometric mean metric learning

Article ID	Journal	Published Year	Pages	File Type
6939666	Pattern Recognition	2018	30 Pages	PDF

Abstract

With the fast development of information acquisition, there is a rapid growth of multi-modality data, e.g., text, audio, image and video, in health care, multimedia retrieval and many other applications. Confronted with the challenges of clustering, classification or regression with multi-modality information, it is essential to effectively measure the distance or similarity between objects described with heterogeneous features. Metric learning, aimed at finding a task-oriented distance function, is a hot topic in machine learning. However, most existing algorithms lack efficiency for high-dimensional multi-modality tasks. In this work, we develop an effective and efficient metric learning algorithm for multi-modality data, i.e., Efficient Multi-modal Geometric Mean Metric Learning (EMGMML). The proposed algorithm learns a distinctive distance metric for each view by minimizing the distance between similar pairs while maximizing the distance between dissimilar pairs. To avoid overfitting, the optimization objective is regularized by symmetrized LogDet divergence. EMGMML is very efficient in that there is a closed-form solution for each distance metric. Experimental results show that the proposed algorithm outperforms the state-of-the-art metric learning methods in terms of both accuracy and efficiency.

Keywords

Efficiency Geometric mean Multi-modality Metric learning