دانلود رایگان مقاله: تشخیص هرزنامه توییتر با استفاده از خوشه جریان داده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
394034	665716	2014	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Twitter spammer detection using data stream clustering

ترجمه فارسی عنوان

تشخیص هرزنامه توییتر با استفاده از خوشه جریان داده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

توییتر، شناسایی هرزنامه، خوشه بندی جریان داده ها

Spam detection - تشخیص هرزنامه Twitter - توییتر Data stream - جریان داده ها Clustering - خوشه بندی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

تشخیص هرزنامه توییتر با استفاده از خوشه جریان داده

چکیده انگلیسی

• We propose treating spam detection on Twitter as an anomaly detection problem.
• 95 One-gram features are introduced to represent tweet content.
• Due to the nature of Twitter, stream mining techniques are utilized to detect spam.
• DenStream and StreamKM++ had roughly 98% detection with under 10% false positives.
• Combination of algorithms above produced 100% detection with under 3% false positives.

The rapid growth of Twitter has triggered a dramatic increase in spam volume and sophistication. The abuse of certain Twitter components such as “hashtags”, “mentions”, and shortened URLs enables spammers to operate efficiently. These same features, however, may be a key factor in identifying new spam accounts as shown in previous studies. Our study provides three novel contributions. Firstly, previous studies have approached spam detection as a classification problem, whereas we view it as an anomaly detection problem. Secondly, 95 one-gram features from tweet text were introduced alongside the user information analyzed in previous studies. Finally, to effectively handle the streaming nature of tweets, two stream clustering algorithms, StreamKM++ and DenStream, were modified to facilitate spam identification. Both algorithms clustered normal Twitter users, treating outliers as spammers. Each of these algorithms performed well individually, with StreamKM++ achieving 99% recall and a 6.4% false positive rate; and DenStream producing 99% recall and a 2.8% false positive rate. When used in conjunction, these algorithms reached 100% recall and a 2.2% false positive rate, meaning that our system was able to identify 100% of the spammers in our test while incorrectly detecting only 2.2% of normal users as spammers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 260, 1 March 2014, Pages 64–73

نویسندگان

Zachary Miller, Brian Dickinson, William Deitrick, Wei Hu, Alex Hai Wang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تشخیص هرزنامه توییتر با استفاده از خوشه جریان داده

دسترسی سریع

ارتباط

English Website