PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder

Article ID	Journal	Published Year	Pages	File Type
552369	Decision Support Systems	2016	10 Pages	PDF

Abstract

•Exploit URL patterns based on the proposed N-gram model to extract identity keywords•Attain robustness in detecting phishing webpages hosted in any language•Offers long-term effectiveness by leveraging on permanent phishing characteristic•Achieve higher accuracy in finding target identity by using compromise programming•Suppress false positives by exploiting indirect identity relationships

This paper proposes a phishing detection technique based on the difference between the target and actual identities of a webpage. The proposed phishing detection approach, called PhishWHO, can be divided into three phases. The first phase extracts identity keywords from the textual contents of the website, where a novel weighted URL tokens system based on the N-gram model is proposed. The second phase finds the target domain name by using a search engine, and the target domain name is selected based on identity-relevant features. In the final phase, a 3-tier identity matching system is proposed to determine the legitimacy of the query webpage. The overall experimental results suggest that the proposed system outperforms the conventional phishing detection methods considered.

Keywords

n-Gram Phishing Detection Search engine