Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6855611 | Expert Systems with Applications | 2016 | 16 Pages |
Abstract
Over the last few decades, the advent of telecommunication systems has allowed a growing exchange of electronic messages around the world. Unfortunately, irrelevant and/or unsolicited content corresponds to the majority of this volume of data, and to decide whether to keep or discard each message is a known challenge in the context of machine learning. This paper proposes an anti-spam filtering approach base on linguistic techniques. The real effect of each system parameter is evaluated through design factorial analysis using two different classifiers: first using Support Vector Machine (SVM) and second applying Naive Bayesian (NB) classification. This analysis is detailed and discussed providing a step-by-step guide for developers and users of anti-spam filters. Based on different system metrics, multi-objective optimization is applied in order to obtain the optimal filter setup. Evaluation of anti-spam filter under optimal configuration showed that SVM-based system achieved an accuracy performance above 98% whereas the NB-based system reached 87%. Results also reveal that linguistic techniques are relevant for the NB classifier but do not contribute to improve the SVM-based system performance.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Marcelo V.C. Aragão, Edielson Prevato Frigieri, Carlos A. Ynoguti, Anderson P. Paiva,