کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
484977 703302 2015 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Normalization of Noisy Text Data
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Normalization of Noisy Text Data
چکیده انگلیسی

The impact of Social media and SMS is increasing in our daily lives. These sources provide the analysts with large amount of text data for data mining and finding patterns. However, this data is notoriously noisy as people use lot of short hand language and hence destroying its utility for analyzing. Hence, it is important to convert this noisy text into Standard English. In this paper, we target the not-in-vocabulary (NIV) words present in these sources and propose a method to identify and normalize these NIV words. Complied databases and context are exploited to replace the ill-formed words and select the best possible correction for that word. This method can also replace internet slang into pure English and correct the spelling errors made to some extent.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 45, 2015, Pages 127-132