کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
484977 | 703302 | 2015 | 6 صفحه PDF | دانلود رایگان |

The impact of Social media and SMS is increasing in our daily lives. These sources provide the analysts with large amount of text data for data mining and finding patterns. However, this data is notoriously noisy as people use lot of short hand language and hence destroying its utility for analyzing. Hence, it is important to convert this noisy text into Standard English. In this paper, we target the not-in-vocabulary (NIV) words present in these sources and propose a method to identify and normalize these NIV words. Complied databases and context are exploited to replace the ill-formed words and select the best possible correction for that word. This method can also replace internet slang into pure English and correct the spelling errors made to some extent.
Journal: Procedia Computer Science - Volume 45, 2015, Pages 127-132