کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
515563 | 867045 | 2013 | 16 صفحه PDF | دانلود رایگان |

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.
► We collect two large sets of keyword-based queries on two eDonkey servers.
► We design a tool for automatic detection of paedophile queries.
► We evaluate our tool success rate with 21 experts of online paedophile activity.
► We reliably estimate that the fraction of paedophile queries is 0.25%.
► We design several approaches to estimate the fraction of paedophile users (0.2%).
Journal: Information Processing & Management - Volume 49, Issue 1, January 2013, Pages 248–263