The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression

Article ID	Journal	Published Year	Pages	File Type
523359	Journal of Informetrics	2016	11 Pages	PDF

Abstract

•The hooked power law fits citation data from a single subject better than the discretised lognormal distribution in science.•The discretised lognormal distribution fits citation from a single subject better than the hooked power law outside science.•After a transformation, normal distribution parameters are more stable than discrete distribution parameters for citation data.

Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for younger articles. Conversely, the discretised lognormal tends to fit best for arts, humanities, social science and engineering fields. The difference between the fits of the distributions is mostly small, however, and so either could reasonably be used for modelling citation data. For regression analyses the best option is to use ordinary least squares regression applied to the natural logarithm of citation counts plus one, especially for sets of younger articles, because of the increased precision of the parameters.

Keywords

Hooked power law Citation analysis Citation distributions scientometrics