Article ID Journal Published Year Pages File Type
484370 Procedia Computer Science 2015 8 Pages PDF
Abstract

Arabic language is the fifth most widely used language on Internet1. Every day, a huge volume of Arabic comments and reviews have been generated concerning different aspects of our life. In the light of the scarcity of systems to analyze this data, we propose in this paper a complete approach in order to identify and classify author's opinions. It is conducted using a dataset consisting of 625 Arabic reviews and comments collected from Trip Advisor website which fall into five classes. We started first by choosing the appropriate stemming algorithm, and used it to introduce our new mathematical approach to formulate opinions. The classification which based on Support Vector Machines derived a scheme, which, in turn, often needed to be refined as some reviews remained unclassified. We opted then for a new similarity approach based on k-nearest neighbors and feature weighting to classify them. Finally, we compared our global approach with two recent works in terms of accuracy. The results obtained have met our expectations.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)