Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6856506 | Information Sciences | 2018 | 28 Pages |
Abstract
The display names from an individual across Online Social Networks (OSNs) always contain abundant information redundancies because most users tend to use one main name or similar names across OSNs to make them easier to remember or to build their online reputation. These information redundancies are of great benefit to information fusion across OSNs. In this paper, we aim to measure these information redundancies between different display names of the same individual. Based on the cross-site linking function of Foursquare, we first develop a distributed crawler to extract the display names that individuals used in Facebook, Twitter and Foursquare, respectively. We construct three display name datasets across three OSNs, and measure the information redundancies in three ways: length similarity, character similarity and letter distribution similarity. We also analyze the evolution of redundant information over time. Finally, we apply the measurement results to the user identification across OSNs. We find that (1) more than 45% of users tend to use the same display name across OSNs; (2) the display names of the same individual for different OSNs show high similarity; (3) the information redundancies of display names are time-independent; (4) the AUC values of user identification results only based on display names are more than 0.9 on three datasets.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Li Yongjun, Peng You, Zhang Zhen, Wu Mingjie, Xu Quanqing, Yin Hongzhi,