kNN processing with co-space distance in SoLoMo systems

Article ID	Journal	Published Year	Pages	File Type
382348	Expert Systems with Applications	2014	16 Pages	PDF

Abstract

•We propose a new distance function, cospace distance, to measure the similarity between users.•We design a progressive kNN search algorithm for the cospace distance.•A new caching strategy is adopted to reduce the overhead of kNN processing.•We use MapReduce and key-value store to support parellel processing of large social data.

With the increasing popularity of smart phones, SoLoMo (Social-Location-Mobile) systems are expected to be fast-growing and become a popular mobile social networking platform. A main challenge in such systems is on the creation of stable links between users. For each online user, the current SoLoMo system continuously returns his/her kNN (k Nearest Neighbor) users based on their geo-locations. Such a recommendation approach is simple, but fails to create sustainable friendships. Instead, it would be more effective to tap onto the existing social relationships in conventional social networks, such as Facebook and Twitter, to provide a “better” friend recommendations.To measure the similarity between users, we propose a new metric, co-space distance, by considering both the user distances in the real world (physical distance) and the virtual world (social distance). The co-space distance measures the similarity of two users in the SoLoMo system. We compute the social distances between users based on their public information in the conventional social networks, which can be achieved by a few MapReduce jobs. To facilitate efficient computation of the social distance, we build a distributed index on top of the key-value store, and maintain the users’ geo-locations using an R-tree. For each query on finding potential friends around a location, we return kNN neighbors to each user based on their co-space distances. We propose a progressive top-k processing strategy and an adaptive-caching strategy to facilitate efficient query processing. Experiments with Gowalla dataset1 show the effectiveness and efficiency of our recommendation approach.