کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6861601 1439255 2018 38 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories
ترجمه فارسی عنوان
آموزش تکامل رشته ها از ادبیات علمی: یک رویکرد خوشه بندی کاربردی به مسیرهای شمارش عناصر نرمال
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The growing availability of large diachronic corpora of scientific literature offers the opportunity of reading the temporal evolution of concepts, methods and applications, i.e., the history of disciplines involved in the strand under investigation. After a retrieval process of the most relevant keywords, bag-of-words approaches produce words  ×  time-points contingency tables, i.e. the frequencies of each word in the set of texts grouped by time-points. Through the analysis of word counts over the observed period of time, main purpose of the study is, after reconstructing the “life-cycle” of words, clustering words that have similar life-cycles and, thus, detecting prototypical or exemplary temporal patterns. Unveiling such relevant and (through expert opinion) meaningful inner dynamics enables us to trace a historical narrative of the discipline of interest. However, different history readings are possible depending on the type of data normalization, which is needed to account for the fluctuating size of texts across time and the general problems of data sparsity and strong asymmetry. This study proposes a methodology consisting of (1) a stepwise information retrieval procedure for keywords' selection and (2) a functional clustering two-stage approach for statistical learning. Moreover, a sample of possible normalizations of word frequencies is considered, showing that the different concept of curve similarity induced in clustering by the type of transformation heavily affects groups' composition and size. The corpus of titles of scientific papers published by the American Statistical Association journals in the time span 1888-2012 is examined for illustration.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 146, 15 April 2018, Pages 129-141
نویسندگان
, ,