Lucas Layman, Gunnar Kudrjavets, Nachiappan Nagappan
L. Layman, G. Kudrjavets, and N. Nagappan, “Iterative identification of fault-prone binaries using inprocess metrics,” in Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM ’08, (New York, New York, USA), p. 206, ACM Press, oct 2008
Publication year: 2008

Code churn, the amount of code change taking place within a software unit over time, has been correlated with fault-proneness in software systems. We investigate the use of code churn and static metrics collected at regular time intervals during the development cycle to predict faults in an iterative, in-process manner. We collected 159 churn and structure metrics from six, four-month snapshots of a 1 million LOC Microsoft product. The number of software faults fixed during each period is recorded per binary module. Using stepwise logistic regression, we create a prediction model to identify fault-prone binaries using three parameters: code churn (the number of new and changed blocks); class Fan In and class Fan Out (normalized by lines of code). The iteratively-built model is 80.0% accurate at predicting fault-prone and non-fault-prone binaries. These fault-prediction models have the advantage of allowing the engineers to observe how their fault-prediction profile evolves over time.