Normalization of Noisy Text Data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
484977	703302	2015	6 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

The impact of Social media and SMS is increasing in our daily lives. These sources provide the analysts with large amount of text data for data mining and finding patterns. However, this data is notoriously noisy as people use lot of short hand language and hence destroying its utility for analyzing. Hence, it is important to convert this noisy text into Standard English. In this paper, we target the not-in-vocabulary (NIV) words present in these sources and propose a method to identify and normalize these NIV words. Complied databases and context are exploited to replace the ill-formed words and select the best possible correction for that word. This method can also replace internet slang into pure English and correct the spelling errors made to some extent.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 45, 2015, Pages 127-132

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Normalization of Noisy Text Data

دسترسی سریع

ارتباط

English Website