کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515865 867124 2014 27 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Crime profiling for the Arabic language using computational linguistic techniques
ترجمه فارسی عنوان
پروفایل جرم برای زبان عربی با استفاده از تکنیک های زبان شناسی محاسباتی
کلمات کلیدی
زبان عربی، دامنه جرم تشخیص الگو، خوشه بندی استخراج اطلاعات، تجزیه و تحلیل همگانی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Text mining system for extraction of information related to crime from Arabic texts.
• Local grammar used to extract information and build dictionaries automatically.
• Visualisation of clustering enhances ability to analyse crime information in corpora.

Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 2, March 2014, Pages 315–341
نویسندگان
, , ,