کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960436 1446499 2017 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
News Article Text Classification in Indonesian Language
ترجمه فارسی عنوان
اخبار مقالات متن طبقه بندی در زبان اندونزی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

This research intends to find the appropriate algorithm to automatically classify a news article in Indonesian Language. We obtain our dataset which is taken by using a web crawling method from www.cnnindonesia.com. First of all, the document will first undergo some Text Preprocessing method in the form of Lemmatization and Stopwords Removal. The reason we are doing the Text Preprocessing step before anything else is to minimize the noise in the document. Next, we apply Feature Selection onto the document to further separate important words and less important words inside the document. After applying Feature Selection, the document will be classified by the classifier. We are comparing the TF-IDF and SVD algorithm for feature selection, while also comparing the Multinomial Naïve Bayes, Multivariate Bernoulli Naïve Bayes, and Support Vector Machine for the Classifiers. Based on the test results, the combination of TF-IDF and Multinomial Naïve Bayes Classifier gives the highest result compared to the other algorithms, which precision is 0.9841519 and its recall is 0.9840000. The result outperform the previous similar study that classify news article in Indonesian language which obtained 85% of accuracy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 116, 2017, Pages 137-143
نویسندگان
, , , , ,