Factorial design analysis applied to the performance of SMS anti-spam filtering systems

Article ID	Journal	Published Year	Pages	File Type
6855611	Expert Systems with Applications	2016	16 Pages	PDF

Abstract

Over the last few decades, the advent of telecommunication systems has allowed a growing exchange of electronic messages around the world. Unfortunately, irrelevant and/or unsolicited content corresponds to the majority of this volume of data, and to decide whether to keep or discard each message is a known challenge in the context of machine learning. This paper proposes an anti-spam filtering approach base on linguistic techniques. The real effect of each system parameter is evaluated through design factorial analysis using two different classifiers: first using Support Vector Machine (SVM) and second applying Naive Bayesian (NB) classification. This analysis is detailed and discussed providing a step-by-step guide for developers and users of anti-spam filters. Based on different system metrics, multi-objective optimization is applied in order to obtain the optimal filter setup. Evaluation of anti-spam filter under optimal configuration showed that SVM-based system achieved an accuracy performance above 98% whereas the NB-based system reached 87%. Results also reveal that linguistic techniques are relevant for the NB classifier but do not contribute to improve the SVM-based system performance.

Keywords

Short message service (SMS)Factorial design Spam filtering Support vector machine (SVM)