کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
523929 868528 2015 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The effect of data pre-processing on understanding the evolution of collaboration networks
ترجمه فارسی عنوان
اثر پیش پردازش داده ها در درک تکامل شبکه های همکاری
کلمات کلیدی
شبکه همکاری، تکامل شبکه، ابهام نام بی نظمی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• Author names were disambiguated by algorithm, all-, and first-initial of given name.
• Algorithmic disambiguation approximated the ground-truth better than initial methods.
• Initial methods distorted size, degree, distance, and clustering of coauthor network.
• Distortion of network properties by initial methods got severe over time.
• Initial methods produced degree distributions seemingly following a power law.

This paper shows empirically how the choice of certain data pre-processing methods for disambiguating author names affects our understanding of the structure and evolution of co-publication networks. Thirty years of publication records from 125 Information Systems journals were obtained from DBLP. Author names in the data were pre-processed via algorithmic disambiguation. We applied the commonly used all-initials and first-initial based disambiguation methods to the data, generated over-time networks with a yearly resolution, and calculated standard network metrics on these graphs. Our results show that initial-based methods underestimate the number of unique authors, average distance, and clustering coefficient, while overestimating the number of edges, average degree, and ratios of the largest components. These self-reinforcing growth and shrinkage mechanisms amplify over time. This can lead to false findings about fundamental network characteristics such as topology and reasoning about underlying social processes. It can also cause erroneous predictions of trends in future network evolution and suggest unjustified policies, interventions and funding decisions. The findings from this study suggest that scholars need to be more attentive to data pre-processing when analyzing or reusing bibliometric data.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Informetrics - Volume 9, Issue 1, January 2015, Pages 226–236
نویسندگان
, ,