Article ID Journal Published Year Pages File Type
4968108 Journal of Informetrics 2017 17 Pages PDF
Abstract

•46% of the articles receiving citances containing “discovery” words are scientific discoveries.•Eight percent of the articles having 20 or more discovery citances resulted in a Nobel Prize.•One-third of the articles on the list were co-cited as independent multiple discoveries.•Discovery recognition can begin early on and peaks after two decades with delays for some articles.•Machine learning can automate discovery identification with an accuracy and F1 value of 94%.

A procedure for identifying discoveries in the biomedical sciences is described that makes use of citation context information, or more precisely citing sentences, drawn from the PubMed Central database. The procedure focuses on use of specific terms in the citing sentences and the joint appearance of cited references. After a manual screening process to remove non-discoveries, a list of over 100 discoveries and their associated articles is compiled and characterized by subject matter and by type of discovery. The phenomenon of multiple discovery is shown to play an important role. The onset and timing of recognition of the articles are studied by comparing the number of citing sentences with and without discovery terms, and show both early onset and delays in recognition. A comparative analysis of the vocabularies of the discovery and non-discovery sentences reveals the types of words and concepts that scientists associate with discoveries. A machine learning application is used to efficiently extend the list. Implications of the findings for understanding the nature and justification of scientific discoveries are discussed.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,