A solution for multicollinearity in stochastic frontier production function models

Castaño, Elkin; Gallón, Santiago; Castaño, Elkin; Gallón, Santiago

doi:10.17533/udea.le.n86a01

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Lecturas de Economía

Print version ISSN 0120-2596

Lect. Econ. no.86 Medellín Jan./June 2017

https://doi.org/10.17533/udea.le.n86a01

Article

A solution for multicollinearity in stochastic frontier production function models

Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica

Une solution au probléme de la multicolinéarité dans les modéles de fonction production á frontiére stochastique

Elkin Castaño^*

Santiago Gallón^**

¹* Elkin Castaño: Associate Professor. Departamento de Economía, Facultad de Ciencias Económicas, Universidad de Antioquia, and Escuela de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia, Medellín, Colombia. Postal address: Calle 67 No. 53 108, Oficina 13-116. E-mail: elkincv@gmail.com.

²** Santiago Gallón: Assistant Professor. Departamento de Matemáticas y Estadística, Facultad de Ciencias Económicas, Universidad de Antioquia, Medellín, Colombia. Postal address: Calle 67 No. 53-108, Oficina 13-116. E-mail: santiago.gallon@udea.edu.co.

Abstract

This paper considers the problem of collinearity among inputs in a stochastic frontier production model, an issue that has received little attention in the econometric literature. To address this problem, a principal-component-based solution is proposed, which allows carrying out a joint interpretation of technical efficiency and the technology parameters of the model. Applications of the method to simulated and real data show its usability and effective performance.

Keywords: stochastic frontier analysis; technical efficiency; productivity; multicollinearity; principal component estimation.

Resumen

Este artículo considera el problema de colinealidad entre insumos en un modelo de producción de frontera estocástica, un tema que ha recibido poca atención en la literatura econométrica. Para abordar el problema, se propone una solución basada en componentes principales que permite interpretar conjuntamente la eficiencia técnica y los parámetros de tecnología del modelo. Los resultados de la aplicación del método con datos simulados y reales muestran que éste es fácil de usar y presenta un buen desempeño.

Palabras clave: análisis de frontera estocástica; eficiencia técnica; productividad; multicolinealidad; estimación de componentes principales

Résumé

Cet article examine le probléme de la colinéarité concernant les inputs dans un modéle de production á frontiére stochastique, une question qui a reçu peu trés d'attention dans la littérature économétrique. Pour résoudre ce probléme, nous proposons une solution basée dans la méthode des composants principaux, laquelle permet d'interpréter á la fois l'efficacité et la technologie des paramétres techniques. Tout en utilisant des données réelles et simulées, les résultats de l'application de la méthode montrent qu'elle est facile á utiliser et elle présente en plus une bonne performance.

Mots-clés: analyse de frontière stochastique; efficacité technique; productivité; multicolinéarité; estimation des composants principaux

Introduction

It is well known that the production frontier and technical efficiency anal yses on a productive unit assume that deviations of the observed product from its maximum (or potential) attainable output, located on the produc tion frontier, are due exclusively to inefficiencies of the productive unit (see, e.g., ^{Kumbhakar & Lovell, 2000}; ^{Coelli, et al., 2005}). For instance, if the as sumed production function is a Cobb-Douglas technology ^{_{y = x⊤}} β + v, where y and x are the logarithms of the observed output and the input vec tor respectively, then the production frontier ^_x⊤ β is deterministic, and ^{_{v = y −x⊤}} β corresponds to the production inefficiency. The lack of randomness in the production frontier of this kind of models does not correspond to the real economic life, where uncontrollable random production shocks occur commonly.

The stochastic frontier production model (^{Aigner, Lovell & Schmidt, 1977}; ^{Meeusen & van den Broeck, 1977}) is specified as

() 1

where y_i is the observed output and x_i the k-dimensional vector of inputs

for the ith firm, represent the deterministic and noise components of the frontier respectively, x_i ^⊤ β + v_i is the maximum output reached by the firm which constitutes the stochastic frontier, and u_i is the non-negative random technical inefficiency component (i.e., the amount by which the firm fails to achieve its optimum). A symmetric distribution, such as the normal distribution, is usually assumed for vi. It is also common to assume that v_i and u_i are independent, and that both errors are uncorre lated with x_i . Typically, the production function relies on a Cobb-Douglas, translog, or any other logarithmic production model log(yi)= x_i ^⊤ β + v_i - u_i , where the components of x_i are logarithms of inputs, its squares and cross products.

Most of the proposed stochastic frontier models in the literature differ mainly on the assumed probability distribution function for the inefficiency component u >= 0 in order to apply the maximum likelihood estimation method. In this regard, ^{Kumbhakar and Lovell (2000}), ^{Coelli, et al. (2005}), and ^{Greene (2008}) present an extensive literature about some distributions. Some instances are the half-normal model u ~ ^_N+ (0,θ² _u), where N⁺ denotes the non-negative half-normal distribution (^{Aigner, Lovell & Schmidt, 1977}); the exponential model u ~ Exp(λ), λ > 0 (^{Meeusen & van den Broeck, 1977}; Aigner, Lovell & Schmidt, 1977); the gamma model u ~ Γ(λ, θ), λ > 0 and θ > 0 (^{Stevenson, 1980}; Greene, 1980a; Greene, 1980b); and the truncated normal u ~ ^_N+ (µ_u, σ_u ²⁾ (Stevenson, 1980).

An issue with applications of stochastic frontier analysis emerges when inputs are highly correlated, from which the multicollinearity problem arises, leading to precision loss in estimates. This loss is also given by low input variability. In the presence of collinearity, it is known that: (i) separating the individual effects of each independent variable could be a difficult task; (ii) the precision loss is expressed in large estimated variances of estimates, and hence the parameters could be non-statistically significant; (iii) the esti mated coefficients can have incorrect signs and impossible magnitudes; and (iv) there are instability problems in the sense that small changes in obser vations, or eliminating an apparently insignificant variable, can produce large changes in estimates (see, e.g., ^{Belsley, Kuh & Welsh, 1980}; ^{Fomby, Johnson & Hill, 1984}; ^{Groß, 2003}). Therefore, it is clear that multicollinearity is a data-driven issue rather than a statistical one (Belsley, Kuh & Welsh, 1980), which can have harmful implications for the estimation of technology coeffi cients due to their relation with the scale returns generated by the production model.

Despite these drawbacks, a great extent of literature on stochastic fron tier analysis considers the multicollinearity problem as unimportant or uses a non-statistical solution. For example, ^{Filippini, et al. (2008}) exclude the input whose correlation with other inputs is quite high in order to prevent multicollinearity. Other studies sacrifice the advantages of flexible functional forms for the deterministic component due to the cost of statistically insignif icant estimates generated by unreliable parameter estimates resulting from lin ear dependencies between inputs (^{Kumbhakar & Lovell, 2000}; Puig & Junoy, 2001; Filippini, 2008). Finally, others argue that, when technical inefficiency estimation is the main aim, multicollinearity is not necessarily a serious prob lem and the interpretation of estimates is secondary (Puig & Junoy, 2001). To the best of our knowledge, no theoretical research has been reported on studying both the stochastic frontier analysis and multicollinearity jointly.

In this paper, we propose a principal-component-based solution for mul ticollinearity in a stochastic frontier model. Basically, we use a re-paramete rization of the model in terms of all k principal components and restrict the corresponding coefficient vector to those principal components associated to the r < k nonzero eigenvalues. Finally, estimates of the original model are recovered. The solution permits a joint estimation of the technical effi ciency and parameters through this better specified model. Also, through a simulation experiment, the proposed estimator is shown to be consistent and has less mean square error with respect to the traditional stochastic frontier analysis.

The rest of the paper is organized as follows. In Section I., the solution is described, and its performance is studied by a Monte Carlo simulation ex periment in Section II. In Section III., an application with real data is carried out. Finally, some conclusions are given.

I. The principal component solution

For the case where there is only near exact multicollinearity (i.e., when one or more nearly exact linear relations exist among the regressors), we consider the matrix representation of the stochastic frontier production model (1),

() 2

where y, v, u, and 1 are n-dimensional vectors of observed outputs, produc tion and inefficiency random errors, and ones respectively; X is the n × k design matrix of inputs; and β the corresponding k-dimensional vector of coefficients. For clarity and notational simplicity, all inputs are assumed to be standardized in the sequel.

Now, based on the spectral decomposition of the k × k symmetric matrix ^_X⊤X ,

^_X⊤X = P Λ P^⊤ ,

where Λ = diag(λ₁, λ₂,..., λ_k) is the diagonal eigenvalues matrix (with λ₁ ≥ λ₂ ≥··· ≥ λ_k), and P =(p ₁, p ₂,...,p_k ) the corresponding orthogonal eigenvectors matrix.

By the orthogonality of P (i.e., PP ^⊤ = P ^⊤ P = I), the regression model (2) can be re-parameterized as

()3

where Z = XP = (z₁, z₂,..., z_k) is the matrix of principal components z_j = Xpj with the property ^_zT _j z_j = λ _j , ∀j, and θ = P ^⊤ β.

From the theory of principal component analysis -PCA- (see, e.g., Jol liffe, 2002), it is well known that the principal components z_j = Xp_j are orthogonal, where the first principal component z₁ has the maximal variance (i.e., the largest amount of information) of the original variables, the second principal component z₂ has the next maximal variance after the first prin cipal component, and so on. Note that if the jth characteristic root λ_j is approximately equal to zero, then z_j ≈ 0.

Additionally, if all k principal components are used, the same parameter vector β is obtained, which is unreliable under collinearity among the exoge nous variables as was pointed out in the introduction. In other words, fairly small eigenvalues of the ^_X⊤X matrix generate imprecisions in the OLS esti mator Therefore, the strategy consists in preventing that the estimate goes in directions λ_ip_j associated to fairly small λ_j (see ^{Fomby, Johnson & Hill, 1984}; ^{Groß, 2003}).

Thus, to deploy the strategy, we restrict β into the subspace spanned by the columns λ ₁p₁, λ ₂p₂,..., λ_rp_r , where λ ₁ ≥ λ ₂ ≥ · ·· ≥ λ _r > 0 are the r<k largest eigenvalues of X ^⊤ X and λ _r+1 ≈ λ _r+2 ≈ ... ≈ λ _k ≈ 0. This means that range ( X ) = r. Hence, in order to eliminate imprecisions, ^{Massy (1965}), ^{Jolliffe (1982}), ^{Mason and Gunst (1985}), and ^{Hwang and Nettleton (2003}) suggest using (i) the first principal components with the largest vari ance and highly correlated with output y, and (ii) those principal components of low variance but with high output correlation.

Therefore, the model (3) can be re-expressed using the subdivision of the eigenvalues into groups λ₁ ≥ λ₂ ≥··· ≥ λ_r > 0 and λ_r+1 ≈ λ_r+2 ≈ ··· ≈ λ_k ≈ 0 and defining the corresponding partition Z = (Z ₁, Z ₂) = (XP ₁, XP ₂), where Z₁ is the n × r matrix with principal components as sociated to the nonzero eigenvalues and Z₂ the n × (k − r) matrix with the rest of the principal components associated to the eigenvalues approximately equal to zero. Then, assuming that the first r principal components are highly correlated with y in order to simplify the notation, and using Z₂ ≈ 0, the re parameterized model (3) can be expressed as

where θ = (θ₁ ^T, θ₂ ^T) ^T, with θ₁ = P₁ ^T β ₁ and θ₂ = P ^T ₂ β ₂. The constraint

Z₂ ≈ 0 is equivalent to θ₂ ≈ 0.

Finally, the least squares estimator of θ₁ is Thus, the principal component estimator of β in (2) is given by

() 4

with covariance matrix

II. Simulation study

To evaluate the performance of the proposed principal-component-based method, we carried out a Monte Carlo simulation experiment with 20,000 replications on the stochastic frontier model

() 5

with a half-normal/normal specification, where σ _u = 3, σ _v = 2.5, σ ² = σ ² _u + σ ² _v = 15.25, r = σ ² _u/σ ² =0.59, (β ₀, β ₁, β ₂) = (1, 0.8, 0.7); and (x ₁, x ₂) ~ N (µ, Σ) with µ = (20, 25) and Σ = DRD , where D = diag(σ _x1 , σ _x2 )= diag(1, 2); and with ρ = Corr(x₁,x₂) = 0.7, 0.8, 0.9. For the most severe multicollinearity prob lem, where ρ = 0.9, we performed the simulations with n = 1000 to study the large sample properties of the estimator. We used the frontier: Stochastic Frontier Analysis R package version 1.1-0 by ^{Coelli and Henningsen (2013}).

Tables 1-3 show the means, biases, and mean squared errors −MSE− of estimators of β ₁ and β ₂ approximated by the principal-component-based and the usual stochastic frontier analysis methods for the assumed values of ρ. Results indicate that, in general, the coefficient estimators obtained with the principal-component-based method are biased, as these biases do not decrease asymptotically. However, the estimators have less MSE with respect to the ones obtained by the traditional method, even in large samples. The usual estimators are biased for finite samples with greater biases than for the proposed method, although these decrease asymptotically. The estimations for γ and σ ₂ remain unaffected if the principal components are chosen correctly. Finally, when keeping fixed the number of principal components, the biases increase as the linear relationship among variables decreases.

Table 1 ρ = 0.7

Source: author's elaboration.

Table 2 ρ = 0.8

Source: author's elaboration.

Table 3 ρ = 0.9

Source: author's elaboration.

III. Application

To see how the proposed solution behaves with real data, we use the production data of the agricultural and livestock sector with a sample of n = 23 livestock farms. The output variable is the total income, and inputs are labor, capital and other inputs; all have been measured in nominal Colombian −COL− pesos.

Then, a stochastic frontier production model was fitted assuming a Cobb-Douglas functional form with normal-exponential specification, Estimations were carried out us ing the LIMited DEPendent −LIMDEP− econometric software (version 10). As can be seen in Table 4 the only statistically significant parameter is the input corresponding to log(Other inputs₂). Although the variable log(Capital) is insignificant, its estimated coefficient has an unexpected opposite sign, indicating a signal of possible multicollinearity.

Table 4 Estimated Stochastic Frontier Production Function

Source: author's elaboration.

To detect multicollinearity, we computed the scaled condition in dexes. Table 5 shows there are two harmful condition indexes (with values greater than 30), indicating two possible near-linear dependencies among inputs. Thus, under the multicollinearity problem, we applied the proposed principal-component-based solution. The proportion of vari ance explained by the first principal component was 88.6%. Therefore, we applied the solution using this principal component. Table 6 displays the corresponding results. Based on these results, the estimates of the principal-component-based stochastic frontier using the equation (4) are in Table 7. Results show that all inputs are statistically significant with correct signs in accordance to production theory.

Table 5 Condition Indexes

Condition Index
1.000
12.829
42.981
101.730

Source: author's elaboration.

Table 6 Estimated Principal Component Model

Source: author's elaboration.

Table 7 Estimated Principal-Component-Based Stochastic Frontier Model

Source: author's elaboration.

Conclusions

Based on simulation results, the estimators for inputs obtained under the proposed principal-component-based solution are biased, and such biases do not decrease asymptotically. Besides, the estimators have less MSE with respect to the usual ones even in large samples. For finite sam ples, the estimators are biased, and seem to have greater biases than the principal-component-based estimators. Also, the bias diminishes when the sample size increases. If the principal components are correct, the estimation of remains are correct, the proposed method. Furthermore, when keeping fixed the number of prin cipal components, the biases of the proposed estimator increase as the linear relation between covariates decreases. The choice of the number of principal components is critical to the estimation of β, γ and σ², as well as for the efficiency component. After applying the proposed method on real data from the agricultural and livestock sectors to evaluate its tech

nical inefficiency, our method seems to provide better estimation results for the coefficients, as well as for the scale returns, in comparison with the traditional method.

References

Aigner, Dennis; Lovell, Knox & Schmidt, Peater (1977). "Formulation and estimation of stochastic frontier production function models", Journal of Econometrics, Vol. 6, Issue 1, pp. 21-37. [ Links ]

Belsley, David; Kuh, Edwin & Welsh, Roy (1980). Regression Diagnostics: Identifying Inﬂuential Data and Sources of Collinearity. New York: John Wiley & Sons, Inc. [ Links ]

Coelli, Timothy & Henningsen, Arne (2013). Frontier: Stochastic Frontier Analysis. Retrieved from: Retrieved from: http://CRAN.R-Project.org/package=frontier . R package version 1.1-0. (Accessed on July 2014). [ Links ]

Coelli, Timothy; Rao, Prasada D.S.; O'Donnell, Christopher J. & Bat tese, George E. (2005). An Introduction to Eﬃciency and Productivity Analysis (2nd. Ed.). New York: Springer. [ Links ]

Filippini, Massimo; Hrovatin, Nevenka & Zoric, Jelena (2008). "Cost eﬃ ciency of slovenian water distribution utilities: an application of stochas tic frontier methods", Journal of Productivity Analysis, Vol. 29. Issue 2, pp. 169-182. [ Links ]

Fomby, Thomas B.; Johnson, Stanley R. & Hill, Carter (1984Advanced Econometric Methods New York: Springer . [ Links ]

Greene, William (1980a). "Maximum likelihood estimation of econometric frontier functions", Journal of Econometrics, Vol. 13, Issue 1, pp. 27-56. [ Links ]

Greene, William (1980b). "On the estimation of a ﬂexible frontier produc tion model", Journal of Econometrics , Vol. 13, Issue 1, pp. 101-115. [ Links ]

Greene, William (2008). "The econometric approach to eﬃciency analysis". In: Fried, Harold; Lovell, Knox & Schmidt, Shelton (Eds.), The Mea surement of Productive Eﬃciency and Productivity Growth (pp. 92-150). New York, Oxford University Press. [ Links ]

Groß, Jürgen (2003). "Linear Regression", Lecture Notes in Statistics, Vol. 175. Springer. [ Links ]

Hwang, Gene J. T. & Nettleton, Dan (2003). "Principal components re gression with data chosen components and related methods", Techno metrics, Vol. 45, No. 1, pp. 70-79. [ Links ]

Jolliffe, Ian T. (1982). "A note on the use of principal components in regres sion", Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 3, pp. 300-303. [ Links ]

Jolliffe, Ian T. (2002). Principal Component Analysis (2nd Ed.). New York: Springer . [ Links ]

Kumbhakar, Subal C. & Lovell, C. Knox (2000). Stochastic Frontier Analysis. Cambridge: Cambridge University Press. [ Links ]

Mason, Robert & Gunst, Richard (1985). "Selecting principal components in regression", Statistics and Probability Letters, Vol. 3, Issue 6, pp. 299 -301. [ Links ]

Massy, William F. (1965). "Principal components regression in exploratory statistical research", Journal of the American Statistical Association, Vol. 60, Issue 309, pp. 234-256. [ Links ]

Meeusen, Wim & van Den Broeck, Julien (1977). "Eﬃciency estimation from Cobb-Douglas production functions with composed error", Inter national Economic Review, Vol . 18, No. 2, pp. 435-444. [ Links ]

Puig-Junoy, Jaume (2001). "Technical ineﬃciency and public capital in U.S. states: A stochastic frontier approach", Journal of Regional Science, Vol . 41, Issue 1, pp. 75-96. [ Links ]

Stevenson, Rodney (1980). "Likelihood functions for generalized stochastic frontier estimation", Journal of Econometrics , Vol . 13, Issue 1, pp. 58-66. [ Links ]

Received: July 08, 2015; Accepted: May 03, 2016

This is an open-access article distributed under the terms of the Creative Commons Attribution License

Services on Demand

Journal

Article

Indicators

Related links

Share

Lecturas de Economía

Print version ISSN 0120-2596

Lect. Econ. no.86 Medellín Jan./June 2017

https://doi.org/10.17533/udea.le.n86a01