دانلود رایگان مقاله: به سوی طبقهبندی متن پیشرفته عربی با استفاده از شباهت کوزینس و نمایه سازی معنایی معکوس

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4960374	1364896	2017	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing

ترجمه فارسی عنوان

به سوی طبقهبندی متن پیشرفته عربی با استفاده از شباهت کوزینس و نمایه سازی معنایی معکوس

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

متن عربی، طبقه بندی، نظارت بر یادگیری، شباهت کوزین نمایه سازی معناشناسی باقیمانده،

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش مقاله

به سوی طبقهبندی متن پیشرفته عربی با استفاده از شباهت کوزینس و نمایه سازی معنایی معکوس

چکیده انگلیسی

Cosine similarity is one of the most popular distance measures in text classification problems. In this paper, we used this important measure to investigate the performance of Arabic language text classification. For textual features, vector space model (VSM) is generally used as a model to represent textual information as numerical vectors. However, Latent Semantic Indexing (LSI) is a better textual representation technique as it maintains semantic information between the words. Hence, we used the singular value decomposition (SVD) method to extract textual features based on LSI. In our experiments, we conducted comparison between some of the well-known classification methods such as Naïve Bayes, k-Nearest Neighbors, Neural Network, Random Forest, Support Vector Machine, and classification tree. We used a corpus that contains 4,000 documents of ten topics (400 document for each topic). The corpus contains 2,127,197 words with about 139,168 unique words. The testing set contains 400 documents, 40 documents for each topics. As a weighing scheme, we used Term Frequency.Inverse Document Frequency (TF.IDF). This study reveals that the classification methods that use LSI features significantly outperform the TF.IDF-based methods. It also reveals that k-Nearest Neighbors (based on cosine measure) and support vector machine are the best performing classifiers.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of King Saud University - Computer and Information Sciences - Volume 29, Issue 2, April 2017, Pages 189-195

نویسندگان

Fawaz S. Al-Anzi, Dia AbuZeina,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : به سوی طبقهبندی متن پیشرفته عربی با استفاده از شباهت کوزینس و نمایه سازی معنایی معکوس

دسترسی سریع

ارتباط

English Website