Article ID Journal Published Year Pages File Type
6864704 Neurocomputing 2018 7 Pages PDF
Abstract
Vectorized representations as an important data representation way play an essential role in many data mining applications. Now, more and more applications are based on multi-typed information network, such as social networks, which is called heterogeneous networks. However, most data in heterogeneous networks are far from Gaussian distribution. Gaussian models are inappropriate choices to model such data. On the other hand, most traditional embedding methods are based on single typed data, and cannot be directly applied in data with network structures. In this paper, we propose an embedding method, named as Multi-typed Data Embedding (MTDE), vectorized represents the data in non-Gaussian distribution. It achieves Latent Spaces for every typed data and a multi-typed latent translational space by a probabilistic model based on Gibbs sampling method. First, it embeds the objects in network not only considering the relationships in same typed data, but also the network structure. Second, it provides a translational space to make the comparison of different typed data available. Thus, we can utilize MTDE to compare different typed data in more data mining applications. Our experiments on DBLP show that MTDE learns high-quality embedding. Moreover, other data mining tasks, e.g. Clustering, based on MTDE achieve a better performance than the state-of-the-art methods.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , ,