کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1891735 1043919 2012 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Measuring complexity with multifractals in texts. Translation effects
موضوعات مرتبط
مهندسی و علوم پایه فیزیک و نجوم فیزیک آماری و غیرخطی
پیش نمایش صفحه اول مقاله
Measuring complexity with multifractals in texts. Translation effects
چکیده انگلیسی

Should quality be almost a synonymous of complexity? To measure quality appears to be audacious, even very subjective. It is hereby proposed to use a multifractal approach in order to quantify quality, thus through complexity measures. A one-dimensional system is examined. It is known that (all) written texts can be one-dimensional nonlinear maps. Thus, several written texts by the same author are considered, together with their translation, into an unusual language, Esperanto, and asa baseline their corresponding shuffled versions.Different one-dimensional time series can be used: e.g. (i) one based on word lengths, (ii) the other based on word frequencies; both are used for studying, comparing and discussing the map structure. It is shown that a variety in style can be measured through the D(q) and f(α) curves characterizing multifractal objects. This allows to observe on the one hand whether natural and artificial languages significantly influence the writing and the translation, and whether one author’s texts differ technically from each other. In fact, the f(α) curves of the original texts are similar to each other, but the translated text shows marked differences. However in each case, the f(α) curves are far from being parabolic, – in contrast to the shuffled texts. Moreover, the Esperanto text has more extreme values. Criteria are thereby suggested for estimating a text quality, as if it is a time series only. A model is introduced in order to substantiate the findings: it consists in considering a text as a random Cantor set resulting from a binomial cascade of long and short words with appropriate weights. In an appendix, a connection is given with an analysis of turbulence by statistics based on Tsallis generalized entropy. In a second appendix, another view of text (language) complexity is outlined within the copying mistake map concept.


► Two texts in English and one in Esperanto are transformed into 6 time series.
► D(q) and f(alpha) of such (and shuffled) time series are obtained.
► A model for text construction is presented based on a parametrized Cantor set.
► The model parameters can also be used when examining machine translated texts.
► Suggested extensions to higher dimensions: in 2D image analysis and on hypertexts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chaos, Solitons & Fractals - Volume 45, Issue 11, November 2012, Pages 1349–1357
نویسندگان
,