In making performance comparisons through benchmarking analyses, the benchmarking team is interested in obtaining a measure of firms’ relative efficiency. Such information can be used to develop X-factors in price cap regulation, to reward (or punish) companies. Or in case the benchmarking team is the regulator, he might want to publish the rankings or efficiency scores to provide the public with information, putting pressure on managers of poor performing utilities to improve the performance of their firms. In both cases, the accuracy and robustness of inefficiency estimates are very important because they may have significant financial or social impacts. In particular, if the estimated inefficiency scores or rankings are sensitive to the benchmarking method, a more detailed analysis is required to justify the adopted model. Tests for mutual consistency are becoming standard.
Nevertheless, in most cases there is no ‘‘ideal’’ model among the set of potential models. Issues include model specification (cost vs. production and functional form), alternative specification of inputs or outputs (e.g., network length vs. fixed assets as an input), assumptions about error terms, and alternative methodologies (e.g., DEA vs. SFA). Following the work of others, we suggest three levels of sensitivity tests.
To check for the robustness of performance rankings, researchers have begun to compare results from different methodologies: using correlation matrices or verifying whether different models identified the same set of utilities as the most efficient and least efficient firms. Clearly, if efficiency scores are to have any use for managerial incentive or as elements in regulatory mechanisms, stakeholders need to be confident that the scores reflect reality, and are not just artifacts of model specification, sample selection, treatment of outliers, or other steps in the analytic process. Thus, benchmarking teams are performing sensitivity tests.
Three Levels of Sensitivity Tests:
- Level 1: Sensitivity tests of efficiency scores. Pearson correlation matrix can be employed to check the correlation of efficiency scores between pairs of techniques. Furthermore, the Kruskal-Wallis nonparametric test can be used to test the null hypothesis that different techniques generate the same distribution of efficiency scores.
- Level 2: Sensitivity tests of efficiency ranking: If the efficiency scores are not consistent across the different methods, it is still possible that these approaches generate similar rankings of firms by their efficiency score. A clear ranking can help the benchmarking team determine the X factor to be used in setting prices for the firms in the sector. Thus, nonparametric Spearman’s ranking correlation matrix can be used to check the correlation of rankings between pairs of techniques.
- Level 3: Sensitivity tests of efficiency ranking: If the consistency in efficiency level and rankings was not met, it is still possible that these approaches can identify the best and worst performers, which can be especially helpful for rewarding the best performers and punishing the worst performers. The benchmarking team can compare rankings yielded under the different techniques and summarize the overlapping rate of identifying the best and worst performers.
After the three levels of tests, the benchmarking team should have a good sense of the consistency of different methods. If the results are close to each other, we can calculate the geometric means of the efficiency scores for each firm to get a “comprehensive” efficiency measure. If there is substantial variance in all three level tests, the findings would be considered inconclusive; requiring a more detailed analysis to explore problems with the adopted models.
If the results pass the sensitivity tests, the benchmarking team can start to analyze scores and rankings and explore the potential determinants of inefficiencies across firms and over time. The utilities can be divided into different groups by various factors, such as regions, population density, regulatory environment, ownership structure, and vintage to compare the efficiency scores. Second stage regressions (OLS or Tobit) with the efficiency score as the dependent variable can also be used to test the partial effects of these external factors on the firm efficiency. Firms should not be ranked as poor performers if they operate under conditions that differ from those of the other firms. As noted earlier, density, geographic topology, distance from raw water sources, and political constraints on prices (affecting the financial sustainability of operations) affect relative performance.