MTDE: Multi-typed data embedding in heterogeneous networks

Article ID	Journal	Published Year	Pages	File Type
6864704	Neurocomputing	2018	7 Pages	PDF

Abstract

Vectorized representations as an important data representation way play an essential role in many data mining applications. Now, more and more applications are based on multi-typed information network, such as social networks, which is called heterogeneous networks. However, most data in heterogeneous networks are far from Gaussian distribution. Gaussian models are inappropriate choices to model such data. On the other hand, most traditional embedding methods are based on single typed data, and cannot be directly applied in data with network structures. In this paper, we propose an embedding method, named as Multi-typed Data Embedding (MTDE), vectorized represents the data in non-Gaussian distribution. It achieves Latent Spaces for every typed data and a multi-typed latent translational space by a probabilistic model based on Gibbs sampling method. First, it embeds the objects in network not only considering the relationships in same typed data, but also the network structure. Second, it provides a translational space to make the comparison of different typed data available. Thus, we can utilize MTDE to compare different typed data in more data mining applications. Our experiments on DBLP show that MTDE learns high-quality embedding. Moreover, other data mining tasks, e.g. Clustering, based on MTDE achieve a better performance than the state-of-the-art methods.

Keywords

Non-Gaussian distribution Embedding Heterogeneous networks Gibbs sampling Unsupervised learning