کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382661 660778 2013 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
چکیده انگلیسی

An effective algorithm for extracting two useful features from text documents for analyzing word collocation habits, “Frequency Rank Ratio” (FRR) and “Intimacy”, is proposed. FRR is derived from a ranking index of a word according to its word frequency. Intimacy, computed by a compact language model called Influence Language Model (ILM), measures how close a word is to others within the same sentence. Using the proposed features, a visualization framework is developed for word collocation analysis. To evaluate our proposed framework, two corpora are designed and collected from the real-life data covering diverse topics and genres. Extensive simulations are conducted to illustrate the feasibility and effectiveness of our visualization framework. Our results demonstrate that the proposed features and algorithm are able to conduct reliable text analysis efficiently.


► FRR, derived from ranking index of words according to their frequency, is proposed.
► The Influence Language Model is introduced for calculating inter-term Intimacy.
► The Intimacy can capture the inter-term-level features of texts.
► A visualization framework is developed for word collocation analysis for documents.
► FRR and Intimacy are able to represent useful word collocation characteristics.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 11, 1 September 2013, Pages 4301–4314
نویسندگان
, ,