Article ID Journal Published Year Pages File Type
514952 Information Processing & Management 2016 16 Pages PDF
Abstract

•We propose a probabilistic model for matching clusters in different domains without correspondence information.•The proposed method can handle data with more than two domains, and the number of objects in each domain can be different.•We extend the proposed method for a semi-supervised setting.•We demonstrate that the proposed method achieve better matching performance than existing methods using synthetic and real-world data sets.

Object matching is an important task for finding the correspondence between objects in different domains, such as documents in different languages and users in different databases. In this paper, we propose probabilistic latent variable models that offer many-to-many matching without correspondence information or similarity measures between different domains. The proposed model assumes that there is an infinite number of latent vectors that are shared by all domains, and that each object is generated from one of the latent vectors and a domain-specific projection. By inferring the latent vector used for generating each object, objects in different domains are clustered according to the vectors that they share. Thus, we can realize matching between groups of objects in different domains in an unsupervised manner. We give learning procedures of the proposed model based on a stochastic EM algorithm. We also derive learning procedures in a semi-supervised setting, where correspondence information for some objects are given. The effectiveness of the proposed models is demonstrated by experiments on synthetic and real data sets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,