Article ID Journal Published Year Pages File Type
6856506 Information Sciences 2018 28 Pages PDF
Abstract
The display names from an individual across Online Social Networks (OSNs) always contain abundant information redundancies because most users tend to use one main name or similar names across OSNs to make them easier to remember or to build their online reputation. These information redundancies are of great benefit to information fusion across OSNs. In this paper, we aim to measure these information redundancies between different display names of the same individual. Based on the cross-site linking function of Foursquare, we first develop a distributed crawler to extract the display names that individuals used in Facebook, Twitter and Foursquare, respectively. We construct three display name datasets across three OSNs, and measure the information redundancies in three ways: length similarity, character similarity and letter distribution similarity. We also analyze the evolution of redundant information over time. Finally, we apply the measurement results to the user identification across OSNs. We find that (1) more than 45% of users tend to use the same display name across OSNs; (2) the display names of the same individual for different OSNs show high similarity; (3) the information redundancies of display names are time-independent; (4) the AUC values of user identification results only based on display names are more than 0.9 on three datasets.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , , ,