Software organizations are progressively adopting the development practices associated with the Extreme Programming (XP) methodology. Most reports on the efficacy of these practices are anecdotal. This paper provides a benchmark measurement framework for researchers and practitioners to express concretely the XP practices the organization has selected to adopt and/or modify, and the outcome thereof. The framework enables the necessary meta-analysis for combining families of case studies. The results of running framework-based case studies in various contexts will eventually constitute a body of knowledge of systematic, empirical evaluations of XP and its practices. Additionally,this benchmark provides a baseline framework that can be adapted for industrial case studies of other technologies and processes. To provide a foundation on the use of the framework, we present the initial validation of our XP evaluation framework based upon a year-long study of an IBM team that adopted a subset of XP practices.