Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
455953	695609	2013	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Latent Dirichlet allocation - تخصیص Drichell باقیمانده Boosting - تقویت identity theft - دزدی هویت conditional random field - زمینه تصادفی شرطی Named entity - نهاد نامی Natural Language Processing - پردازش زبان‌های طبیعی Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات

پیش نمایش صفحه اول مقاله

Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation

چکیده انگلیسی

Phishing is an attempt to steal users' personal and financial information such as passwords, social security and credit card numbers, via electronic communication such as e-mail and other messaging services. Attackers pretend to be from a legitimate organization and direct users to a fake website that resembles a legitimate website, which is then used to collect users' personal information. In this paper, we propose a novel methodology to detect phishing attacks and to discover the entity/organization that the attackers impersonate during phishing attacks. The proposed multi-stage methodology employs natural language processing and machine learning. The methodology first discovers (i) named entities, which includes names of people, organizations, and locations; and (ii) hidden topics, using (a) Conditional Random Field (CRF) and (b) Latent Dirichlet Allocation (LDA) operating on both phishing and non-phishing data. Utilizing topics and named entities as features, the next stage classifies each message as phishing or non-phishing using AdaBoost. For messages classified as phishing, the final stage discovers the impersonated entity using CRF. Experimental results show that the phishing classifier detects phishing attacks with no misclassification when the proportion of phishing emails is less than 20%. The F-measure obtained was 100%. Our approach also discovers the impersonated entity from messages that are classified as phishing, with a discovery rate of 88.1%. The automatic discovery of impersonated entity from phishing helps the legitimate organization to take down the offending phishing site. This protects their users from falling for phishing attacks, which in turn leads to satisfied customers. Automatic discovery of an impersonated entity also helps email service providers to collaborate with each other to exchange attack information and protect their customers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Security - Volume 34, May 2013, Pages 123–139

نویسندگان

Venkatesh Ramanathan, Harry Wechsler,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation

دسترسی سریع

ارتباط

English Website