کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1860941 1530566 2015 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Word ranking in a single document by Jensen–Shannon divergence
موضوعات مرتبط
مهندسی و علوم پایه فیزیک و نجوم فیزیک و نجوم (عمومی)
پیش نمایش صفحه اول مقاله
Word ranking in a single document by Jensen–Shannon divergence
چکیده انگلیسی


• We apply Jensen–Shannon divergence (JSD) to rank words, in a single document.
• JSD quantifies the (dis)similarity between two probability distributions.
• Spatial distribution of relevant words differs in original and shuffled version of text.
• This method can be applied to term ranking in natural and artificial languages.

Ranking the words in human written texts, according to their relevance to text context, plays a crucial role in many text mining tasks. Highly relevant words concentrate in some limited areas, while the irrelevant ones have nearly random spatial distribution throughout the text. But in the randomly shuffled version of the text, all word types are distributed at random. The difference between spatial distribution of words in the original version of a text and its shuffled version seems a proper criterion for word relevance ranking. In this procedure, spatial distribution of each word type in the document is defined by box counting method. Then we apply Jensen–Shannon divergence to measure the difference between probability distributions of each word in the original text and its shuffled version. This metric properly distinguishes relevant words from irrelevants without requiring any previous knowledge about text structure.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Physics Letters A - Volume 379, Issues 28–29, 28 August 2015, Pages 1627–1632
نویسندگان
, , ,