Article ID Journal Published Year Pages File Type
433472 Science of Computer Programming 2011 20 Pages PDF
Abstract

In order to study software evolution, it is necessary to measure artefacts representative of project releases. If we consider the process of software evolution to be copying with subsequent modification, then, by analogy, placing emphasis on what remains the same between releases will lead to focusing on similarity between artefacts. At the same time, software artefacts–stored digitally as binary strings–are all information. This paper introduces a new method for measuring software evolution in terms of artefacts’ shared information content. A similarity value representing the quantity of information shared between artefact pairs is produced using a calculation based on Kolmogorov complexity. Similarity values for releases are then collated over the software’s evolution to form a map quantifying change through lack of similarity. The method has general applicability: it can disregard otherwise salient software features such as programming paradigm, language or application domain because it considers software artefacts purely in terms of the mathematically justified concept of information content. Three open-source projects are analysed to show the method’s utility. Preliminary experiments on udev and git verify the measurement of the projects’ evolutions. An experiment on ArgoUML validates the measured evolution against experimental data from other studies.

Research highlights► New method: measuring software evolution using shared information content. ► Quantification of project change through lack of similarity between artefacts. ► General applicability: programming paradigm, language, domain are disregarded. ► Verification: experimental analyses of projects udev and git. ► Validation: results for ArgoUML compared with those from two other studies.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
,