Article ID Journal Published Year Pages File Type
455953 Computers & Security 2013 17 Pages PDF
Abstract

Phishing is an attempt to steal users' personal and financial information such as passwords, social security and credit card numbers, via electronic communication such as e-mail and other messaging services. Attackers pretend to be from a legitimate organization and direct users to a fake website that resembles a legitimate website, which is then used to collect users' personal information. In this paper, we propose a novel methodology to detect phishing attacks and to discover the entity/organization that the attackers impersonate during phishing attacks. The proposed multi-stage methodology employs natural language processing and machine learning. The methodology first discovers (i) named entities, which includes names of people, organizations, and locations; and (ii) hidden topics, using (a) Conditional Random Field (CRF) and (b) Latent Dirichlet Allocation (LDA) operating on both phishing and non-phishing data. Utilizing topics and named entities as features, the next stage classifies each message as phishing or non-phishing using AdaBoost. For messages classified as phishing, the final stage discovers the impersonated entity using CRF. Experimental results show that the phishing classifier detects phishing attacks with no misclassification when the proportion of phishing emails is less than 20%. The F-measure obtained was 100%. Our approach also discovers the impersonated entity from messages that are classified as phishing, with a discovery rate of 88.1%. The automatic discovery of impersonated entity from phishing helps the legitimate organization to take down the offending phishing site. This protects their users from falling for phishing attacks, which in turn leads to satisfied customers. Automatic discovery of an impersonated entity also helps email service providers to collaborate with each other to exchange attack information and protect their customers.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,