کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960422 1446499 2017 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Text Normalization Algorithm on Twitter in Complaint Category
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Text Normalization Algorithm on Twitter in Complaint Category
چکیده انگلیسی

Many people use microblog to express complaint or criticism. However, the limitation of the length that can be written is about 160 characters and the text is in unstructured sentence. It becomes the biggest obstacle to process the information. Those unstructured sentences cause a difficulty for preprocessing in text processing tools. Therefore, normalization is needed to make the unstructured sentences can be more understandable by a machine. We proposed a normalization of Indonesian language method which adopting some ideas of normalization from other researchers and adjust to the problem of Indonesian characteristic in unstructured sentence. The experiment exploits Twitter data which use Indonesian language in complaint category. The process is divided into three stages, which are cleaning process, OOV detection and word replacement. List of Basic words and Slang dictionary are used in the OOV detection. On the other hand, Context dictionary is built to solve the ambiguity problem. The algorithm can reaches the accuracy about 90% in a complaint category.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 116, 2017, Pages 20-26
نویسندگان
, , , , , ,