Statistical studies allow analysts to estimate key parameters of cost or production models. Econometric analyses require a large data set to ensure reliable results. Obtaining the number of observations needed to derive an efficient and unbiased estimate of cost (or production) structures can often prove to be a difficult task. Regression results are sensitive to model specification (for example, a linear vs. a non-linear functional form). In addition, for some models, the interpretation of the error term becomes important.

The early studies tended to utilize Ordinary Least Squares (OLS) to estimate cost functions for firms. Due to data limitations, most of these studies were cross-sectional in nature. Besides using data from only a single year researchers utilized data from England and Wales or from the United States. These academic studies often focused on the relative performance of private vs. publicly-owned water and sewerage utilities. In addition, they investigated the extent of scale economies and economies of joint production (providing both water and sewerage services). In some cases, they considered the impacts of residential vs. industrial/commercial customers.

As data from Brazil, Peru, and other emerging nations became available, additional country studies were publishedoften using more advanced econometric (parametric) or non-parametric data analysis techniques. Studies of utilities in France, Italy, and other nations began to appear in the academic literature. Techniques associated with Stochastic Frontier Analysis began to be applied to both production functions and cost functions. Panel data facilitated the incorporation of customer density, topology, and other variables.

The most commonly used parametric methods are ordinary least squares (OLS), corrected ordinary least squares (COLS) models and Stochastic Frontier Analysis (SFA). The main difference between these models is that COLS attributes all the deviations to inefficiency while SFA models attribute part of the deviations to inefficiency and part of the deviations to random noise. In other words, the SFA models take both inefficiency and random noise into account. The most widely used stochastic frontier models include the stochastic production frontier model, stochastic cost frontier model, and stochastic distance function model. Before selecting a specific model, analysts have to make an initial choice between the two most widely used functional forms: Cobb-Douglas function and translog function.

#### Ordinary least squares (OLS) models

OLS techniques can be used to perform benchmarking that relates individual firm performance relative to what would be expected: an estimate of an average production or cost function of a sample of firms. Average benchmarking methods may be used to compare firms with relatively similar costs or when there is a lack of sufficient data of comparable firms for the application of frontier methods. Basically, the method refers to the estimation of a regression functional form for costs or production using the OLS approach. Linear regression analysis seeks to derive a relationship between firm performance (in terms of output or total cost) and market conditions and characteristics of the production processes. Statistical analysis can isolate the impacts of specific conditions or levels of outputso the roles of multiple independent variables can be determined. Data from the firms being compared can then be used to arrive at expected dimensions of firm performance, given the variables characterizing each firm.

The technique of regression analysis is defined by the following steps: 1) selecting both the cost (or output) measure and exogenous variables, 2) estimating a cost (or production) function for the industry, and 3) calculating the efficiency coefficient for each firm within the industry. Predicted versus actual output provides a measure of relative performance. The quality of these results can then be statistically evaluated to provide the policy-maker with a framework for evaluating firms. The linear vs. non-linear issue can be examined by including parameters that capture scale economies or diseconomies.

- Advantages: The statistical method reveals information about cost structures and distinguishes between different variables’ roles in affecting output. Coefficients can be interpreted in terms of cost drivers or how inputs contribute to output.
- Disadvantages: Large data set is necessary in order to obtain reliable results. The regression results are sensitive to functional form if the error term is not adequately interpreted, which can lead to widely varying conclusions, depending on how the regression is initially set up.
- Application: The UK water regulator OFWAT applies mean and average methods to the operating costs (OPEX) and capital expenditures (CAPEX) of water utilities when determining the price caps every five years. OFWAT has developed an efficiency analysis relying on mean and average methods that is a key part of its price determination process.

#### Corrected ordinary least squares (COLS) models

A slightly different approach than OLS involves shifting the line towards the best performing company, which is called Corrected Least Squares methodology (COLS). In a general sense, COLS is merely a shifted average function. Two steps are needed, one to get the expected value of the error term and another to shift or to “center” the equation.

When using OLS or COLS it is good practice to perform Quantile analysis. Quantile analysis helps to overcome the possible effect of outliers on the estimated mean allowing the analyst to detect the presence of performers on specific or extreme quantiles such as the lower (25%) or the upper (75%) quantiles.

- Advantages: The statistical method reveals information about cost structures and distinguishes between different variables’ roles in affecting output. The adjustment turns the OLS into a “frontier” approach.
- Disadvantages: As with OLS, a large data set is necessary in order to obtain reliable results. The regression results are sensitive to functional form if the error term is not adequately interpreted, which can lead to widely varying conclusions depending on how the regression is initially set up. Furthermore, the results are especially sensitive to outliers, since the “best” performer along any dimension serves as the anchor for the estimate. Thus, the performance scores are very sensitive to outliers.
- Application: Most studies that analyze frontier relationships utilize Stochastic Frontier Analysis (SFA). Some simplicity is then lost, but tests of the sources of different types of errors can be identified with SFA.

#### Ordinary least squares (OLS) models

Stochastic Frontier Analysis attempts to estimate an efficient frontier which does incorporates the possibility of measurement error or chance factors in its estimation. To separate inefficiency and noise, strong assumptions are needed on the distribution of noise among each observed firm. Stochastic frontiers may be classified as Production, Cost, and Input Distance frontiers.

A production frontier reveals technical relationships between inputs and outputs of firms and represents an alternative when cost frontiers can not be calculated due to lack of data. The estimated output is the maximum possible output for given inputs of an individual firm. The output difference obtained in the estimation is interpreted as technical inefficiency of each individual firm. On a production frontier, variable returns to scale is the sensible option and appropriate scale efficiency changes need to be included when calculating total factor productivity.

A cost frontier shows costs as a function of the level of output/s and the prices of inputs. It is useful when trying to access the wedge between tariff and minimum costs. Conceptually, the minimum cost function defines a frontier showing costs technically possible associated with various levels of inputs and control variables. Total cost frontier rather than variable or expenditure cost frontier is preferable to account for substitutability of factor inputs. Separate models for CAPEX and OPEX do not allow for allocation of expenditures between operating and capital expenditure. Cost efficiency contains the effects of technical and allocative efficiency.

Each approach (production or cost) may yield different results. The difference will be larger if large allocative distortions are present. In this case, the parameters of the cost frontier will be biased. An important factor to consider when choosing between a cost frontier and a production frontier is that usually regulated firms are required to provide the service at a preset tariff and they must meet demand. In this sense, firms are not allowed to choose their own level of output which makes output an exogenous variable. The regulated firm maximizes benefits by minimizing its costs of producing a given level of output. Cost is the choice variable for the firm so a cost frontier approach is a more sensible choice.

Finally, an input distance frontier is the natural option for regulated industries where output quantity is exogenous and input quantities are endogenous, and when the nature of the technology is multiple outputs or there is not data available on price of inputs. This is the case for water and sewerage as different outputs under the same firm where their provision comes from shared inputs which jointly determine the production function.

A distance function may have either an input or an output orientation. An input orientation looks at how much the input vector may be proportionally contracted with the output vector held fixed. An output orientation looks at how much the output vector may be proportionally expanded with the input vector held fixed. Input distance functions can be estimated by either stochastic or DEA methods. The advantage of a distance frontier with regard to a cost frontier is that firm is not assumed to be minimizing costs. With respect to production frontier is that it avoids the endogenous problem.

- Advantages of Stochastic Frontiers: Accounts for data noise such as data errors and omitted variables. Standard statistical tests can be used to test hypotheses on model specification and significance of the variables included on the model. It is also more amenable to modeling effects of other variables (e.g., environment, quality)
- Disadvantages of Stochastic Frontiers: There is a need of functional form and production technology specification. Also, the separation of noise and inefficiency relies on strong assumptions on the distribution of the error term
- Application: A number of studies utilize these techniques, such as the relative efficiency of public and private water companies in East Asia and the Pacific.