کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
7375325 1480067 2018 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Robustness of sentence length measures in written texts
ترجمه فارسی عنوان
استحکام طول مدت جمله در متون نوشته شده
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات فیزیک ریاضی
چکیده انگلیسی
Hidden structural patterns in written texts have been subject of considerable research in the last decades. In particular, mapping a text into a time series of sentence lengths is a natural way to investigate text structure. Typically, sentence length has been quantified by using measures based on the number of words and the number of characters, but other variations are possible. To quantify the robustness of different sentence length measures, we analyzed a database containing about five hundred books in English. For each book, we extracted six distinct measures of sentence length, including the number of words and number of characters (taking into account lemmatization and stop words removal). We compared these six measures for each book by using (i) Pearson's coefficient to investigate linear correlations; (ii) Kolmogorov-Smirnov test to compare distributions; and (iii) detrended fluctuation analysis (DFA) to quantify auto-correlations. We have found that all six measures exhibit very similar behavior, suggesting that sentence length is a robust measure related to text structure.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Physica A: Statistical Mechanics and its Applications - Volume 506, 15 September 2018, Pages 749-754
نویسندگان
, , ,