Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
515514 | Information Processing & Management | 2009 | 7 Pages |
Abstract
The power-law regularities have been discovered behind many complex natural and social phenomenons. We discover that the power-law regularities, especially the Zipf’s and Heaps’ laws, also exist in large-scale software systems. We find that the distribution of lexical tokens in modern Java, C++ and C programs follows Zipf–Mandelbrot law, and the growth of program vocabulary follows Heaps’ law. The results are obtained through empirical analysis of real-world software systems. We believe our discovery reveals the statistical regularities behind computer programming.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Science Applications
Authors
Hongyu Zhang,