Article ID Journal Published Year Pages File Type
6856420 Information Sciences 2018 52 Pages PDF
Abstract
With heterogeneous information networks becoming ubiquitous and complex, lots of data mining tasks have been explored, including clustering, collaborative filtering and link prediction. Similarity computation is a fundamental task required for many problems of data mining. Although a large amount of similarity measures are developed for assessing similarities in heterogeneous networks, they are usually dependent on the network schema and lack a general manner for integrating kinds of relationships between objects. In this paper, we propose a similarity measure, namely HeteRank, for generally computing similarities in heterogeneous information networks. The relationships between different type objects are represented by a general relationship matrix (GRM) that is built based on the scales of different type objects. Based on GRM, HeteRank fully integrates the multi-type relationships into similarity computation by utilizing all the meetings between objects. The HeteRank equation is further transformed into a simple binomial expression form with considering restart probability. For efficiently computing HeteRank similarities, we divide the similarity computation into two steps: the first step is to compute the intermediate values, and the second step is to compute the similarities based on intermediate values. And then we approximate HeteRank equation by setting thresholds for skipping lower intermediate values and similarity scores. A pruning algorithm is developed to reduce the unnecessary visits, multiplications and additions that make little contribution during similarity computation. Extensive experiments on real datasets demonstrate the effectiveness and efficiency of HeteRank through comparing with the state-of-the-art similarity measures.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,