کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558299 874892 2014 22 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Normalization of informal text
ترجمه فارسی عنوان
عادی سازی متن غیررسمی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• Normalization of abbreviations in noisy, informal text.
• Collection, filtering and annotation of Twitter status messages.
• Comparison of statistical and machine translation approaches.
• Effects of language model order on accuracy.
• Combination of methods to achieve best results.

This paper describes a noisy-channel approach for the normalization of informal text, such as that found in emails, chat rooms, and SMS messages. In particular, we introduce two character-level methods for the abbreviation modeling aspect of the noisy channel model: a statistical classifier using language-based features to decide whether a character is likely to be removed from a word, and a character-level machine translation model. A two-phase approach is used; in the first stage the possible candidates are generated using the selected abbreviation model and in the second stage we choose the best candidate by decoding using a language model. Overall we find that this approach works well and is on par with current research in the field.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 1, January 2014, Pages 256–277
نویسندگان
, ,