SciELO - Scientific Electronic Library Online

 
 issue86Granger causality between exports, economic growth and electricity production: empirical evidence for Latin America author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Lecturas de Economía

Print version ISSN 0120-2596

Lect. Econ.  no.86 Medellín Jan./June 2017

https://doi.org/10.17533/udea.le.n86a01 

Article

A solution for multicollinearity in stochastic frontier production function models

Una solución para la multicolinealidad en modelos de función de producción de frontera estocástica

Une solution au probléme de la multicolinéarité dans les modéles de fonction production á frontiére stochastique

Elkin Castaño* 

Santiago Gallón** 

1* Elkin Castaño: Associate Professor. Departamento de Economía, Facultad de Ciencias Económicas, Universidad de Antioquia, and Escuela de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia, Medellín, Colombia. Postal address: Calle 67 No. 53 108, Oficina 13-116. E-mail: elkincv@gmail.com.

2** Santiago Gallón: Assistant Professor. Departamento de Matemáticas y Estadística, Facultad de Ciencias Económicas, Universidad de Antioquia, Medellín, Colombia. Postal address: Calle 67 No. 53-108, Oficina 13-116. E-mail: santiago.gallon@udea.edu.co.


Abstract

This paper considers the problem of collinearity among inputs in a stochastic frontier production model, an issue that has received little attention in the econometric literature. To address this problem, a principal-component-based solution is proposed, which allows carrying out a joint interpretation of technical efficiency and the technology parameters of the model. Applications of the method to simulated and real data show its usability and effective performance.

Keywords: stochastic frontier analysis; technical efficiency; productivity; multicollinearity; principal component estimation.

Resumen

Este artículo considera el problema de colinealidad entre insumos en un modelo de producción de frontera estocástica, un tema que ha recibido poca atención en la literatura econométrica. Para abordar el problema, se propone una solución basada en componentes principales que permite interpretar conjuntamente la eficiencia técnica y los parámetros de tecnología del modelo. Los resultados de la aplicación del método con datos simulados y reales muestran que éste es fácil de usar y presenta un buen desempeño.

Palabras clave: análisis de frontera estocástica; eficiencia técnica; productividad; multicolinealidad; estimación de componentes principales

Résumé

Cet article examine le probléme de la colinéarité concernant les inputs dans un modéle de production á frontiére stochastique, une question qui a reçu peu trés d'attention dans la littérature économétrique. Pour résoudre ce probléme, nous proposons une solution basée dans la méthode des composants principaux, laquelle permet d'interpréter á la fois l'efficacité et la technologie des paramétres techniques. Tout en utilisant des données réelles et simulées, les résultats de l'application de la méthode montrent qu'elle est facile á utiliser et elle présente en plus une bonne performance.

Mots-clés: analyse de frontière stochastique; efficacité technique; productivité; multicolinéarité; estimation des composants principaux

Introduction

It is well known that the production frontier and technical efficiency anal yses on a productive unit assume that deviations of the observed product from its maximum (or potential) attainable output, located on the produc tion frontier, are due exclusively to inefficiencies of the productive unit (see, e.g., Kumbhakar & Lovell, 2000; Coelli, et al., 2005). For instance, if the as sumed production function is a Cobb-Douglas technology y = x⊤ β + v, where y and x are the logarithms of the observed output and the input vec tor respectively, then the production frontier x⊤ β is deterministic, and v = y −x⊤ β corresponds to the production inefficiency. The lack of randomness in the production frontier of this kind of models does not correspond to the real economic life, where uncontrollable random production shocks occur commonly.

The stochastic frontier production model (Aigner, Lovell & Schmidt, 1977; Meeusen & van den Broeck, 1977) is specified as

() 1

where yi is the observed output and xi the k-dimensional vector of inputs

for the ith firm, represent the deterministic and noise components of the frontier respectively, xi β + vi is the maximum output reached by the firm which constitutes the stochastic frontier, and ui is the non-negative random technical inefficiency component (i.e., the amount by which the firm fails to achieve its optimum). A symmetric distribution, such as the normal distribution, is usually assumed for vi. It is also common to assume that vi and ui are independent, and that both errors are uncorre lated with xi . Typically, the production function relies on a Cobb-Douglas, translog, or any other logarithmic production model log(yi)= xi β + vi - ui , where the components of xi are logarithms of inputs, its squares and cross products.

Most of the proposed stochastic frontier models in the literature differ mainly on the assumed probability distribution function for the inefficiency component u >= 0 in order to apply the maximum likelihood estimation method. In this regard, Kumbhakar and Lovell (2000), Coelli, et al. (2005), and Greene (2008) present an extensive literature about some distributions. Some instances are the half-normal model u ~ N+ (0,θ2 u), where N+ denotes the non-negative half-normal distribution (Aigner, Lovell & Schmidt, 1977); the exponential model u ~ Exp(λ), λ > 0 (Meeusen & van den Broeck, 1977; Aigner, Lovell & Schmidt, 1977); the gamma model u ~ Γ(λ, θ), λ > 0 and θ > 0 (Stevenson, 1980; Greene, 1980a; Greene, 1980b); and the truncated normal u ~ N+ u, σu 2) (Stevenson, 1980).

An issue with applications of stochastic frontier analysis emerges when inputs are highly correlated, from which the multicollinearity problem arises, leading to precision loss in estimates. This loss is also given by low input variability. In the presence of collinearity, it is known that: (i) separating the individual effects of each independent variable could be a difficult task; (ii) the precision loss is expressed in large estimated variances of estimates, and hence the parameters could be non-statistically significant; (iii) the esti mated coefficients can have incorrect signs and impossible magnitudes; and (iv) there are instability problems in the sense that small changes in obser vations, or eliminating an apparently insignificant variable, can produce large changes in estimates (see, e.g., Belsley, Kuh & Welsh, 1980; Fomby, Johnson & Hill, 1984; Groß, 2003). Therefore, it is clear that multicollinearity is a data-driven issue rather than a statistical one (Belsley, Kuh & Welsh, 1980), which can have harmful implications for the estimation of technology coeffi cients due to their relation with the scale returns generated by the production model.

Despite these drawbacks, a great extent of literature on stochastic fron tier analysis considers the multicollinearity problem as unimportant or uses a non-statistical solution. For example, Filippini, et al. (2008) exclude the input whose correlation with other inputs is quite high in order to prevent multicollinearity. Other studies sacrifice the advantages of flexible functional forms for the deterministic component due to the cost of statistically insignif icant estimates generated by unreliable parameter estimates resulting from lin ear dependencies between inputs (Kumbhakar & Lovell, 2000; Puig & Junoy, 2001; Filippini, 2008). Finally, others argue that, when technical inefficiency estimation is the main aim, multicollinearity is not necessarily a serious prob lem and the interpretation of estimates is secondary (Puig & Junoy, 2001). To the best of our knowledge, no theoretical research has been reported on studying both the stochastic frontier analysis and multicollinearity jointly.

In this paper, we propose a principal-component-based solution for mul ticollinearity in a stochastic frontier model. Basically, we use a re-paramete rization of the model in terms of all k principal components and restrict the corresponding coefficient vector to those principal components associated to the r < k nonzero eigenvalues. Finally, estimates of the original model are recovered. The solution permits a joint estimation of the technical effi ciency and parameters through this better specified model. Also, through a simulation experiment, the proposed estimator is shown to be consistent and has less mean square error with respect to the traditional stochastic frontier analysis.

The rest of the paper is organized as follows. In Section I., the solution is described, and its performance is studied by a Monte Carlo simulation ex periment in Section II. In Section III., an application with real data is carried out. Finally, some conclusions are given.

I. The principal component solution

For the case where there is only near exact multicollinearity (i.e., when one or more nearly exact linear relations exist among the regressors), we consider the matrix representation of the stochastic frontier production model (1),

() 2

where y, v, u, and 1 are n-dimensional vectors of observed outputs, produc tion and inefficiency random errors, and ones respectively; X is the n × k design matrix of inputs; and β the corresponding k-dimensional vector of coefficients. For clarity and notational simplicity, all inputs are assumed to be standardized in the sequel.

Now, based on the spectral decomposition of the k × k symmetric matrix X⊤X ,

X⊤X = P Λ P ,

where Λ = diag(λ1, λ2,..., λk) is the diagonal eigenvalues matrix (with λ1 ≥ λ2 ≥··· ≥ λk), and P =(p 1, p 2,...,pk ) the corresponding orthogonal eigenvectors matrix.

By the orthogonality of P (i.e., PP = P P = I), the regression model (2) can be re-parameterized as

()3

where Z = XP = (z1, z2,..., zk) is the matrix of principal components zj = Xpj with the property zT j zj = λ j ,j, and θ = P β.

From the theory of principal component analysis -PCA- (see, e.g., Jol liffe, 2002), it is well known that the principal components zj = Xpj are orthogonal, where the first principal component z1 has the maximal variance (i.e., the largest amount of information) of the original variables, the second principal component z2 has the next maximal variance after the first prin cipal component, and so on. Note that if the jth characteristic root λj is approximately equal to zero, then zj ≈ 0.

Additionally, if all k principal components are used, the same parameter vector β is obtained, which is unreliable under collinearity among the exoge nous variables as was pointed out in the introduction. In other words, fairly small eigenvalues of the X⊤X matrix generate imprecisions in the OLS esti mator Therefore, the strategy consists in preventing that the estimate goes in directions λipj associated to fairly small λj (see Fomby, Johnson & Hill, 1984; Groß, 2003).

Thus, to deploy the strategy, we restrict β into the subspace spanned by the columns λ 1p1, λ 2p2,..., λrpr , where λ 1λ 2 ≥ · ·· ≥ λ r > 0 are the r<k largest eigenvalues of X X and λ r+1λ r+2 ≈ ... ≈ λ k ≈ 0. This means that range ( X ) = r. Hence, in order to eliminate imprecisions, Massy (1965), Jolliffe (1982), Mason and Gunst (1985), and Hwang and Nettleton (2003) suggest using (i) the first principal components with the largest vari ance and highly correlated with output y, and (ii) those principal components of low variance but with high output correlation.

Therefore, the model (3) can be re-expressed using the subdivision of the eigenvalues into groups λ1 ≥ λ2 ≥··· ≥ λr > 0 and λr+1 ≈ λr+2 ≈ ··· ≈ λk ≈ 0 and defining the corresponding partition Z = (Z 1, Z 2) = (XP 1, XP 2), where Z1 is the n × r matrix with principal components as sociated to the nonzero eigenvalues and Z2 the n × (k − r) matrix with the rest of the principal components associated to the eigenvalues approximately equal to zero. Then, assuming that the first r principal components are highly correlated with y in order to simplify the notation, and using Z2 ≈ 0, the re parameterized model (3) can be expressed as

where θ = (θ1 T, θ2 T) T, with θ1 = P1 T β 1 and θ2 = P T 2 β 2. The constraint

Z2 ≈ 0 is equivalent to θ2 ≈ 0.

Finally, the least squares estimator of θ1 is Thus, the principal component estimator of β in (2) is given by

() 4

with covariance matrix

II. Simulation study

To evaluate the performance of the proposed principal-component-based method, we carried out a Monte Carlo simulation experiment with 20,000 replications on the stochastic frontier model

() 5

with a half-normal/normal specification, where σ u = 3, σ v = 2.5, σ 2 = σ 2 u + σ 2 v = 15.25, r = σ 2 u/σ 2 =0.59, (β 0, β 1, β 2) = (1, 0.8, 0.7); and (x 1, x 2) ~ N (µ, Σ) with µ = (20, 25) and Σ = DRD , where D = diag(σ x1 , σ x2 )= diag(1, 2); and with ρ = Corr(x1,x2) = 0.7, 0.8, 0.9. For the most severe multicollinearity prob lem, where ρ = 0.9, we performed the simulations with n = 1000 to study the large sample properties of the estimator. We used the frontier: Stochastic Frontier Analysis R package version 1.1-0 by Coelli and Henningsen (2013).

Tables 1-3 show the means, biases, and mean squared errors −MSE− of estimators of β 1 and β 2 approximated by the principal-component-based and the usual stochastic frontier analysis methods for the assumed values of ρ. Results indicate that, in general, the coefficient estimators obtained with the principal-component-based method are biased, as these biases do not decrease asymptotically. However, the estimators have less MSE with respect to the ones obtained by the traditional method, even in large samples. The usual estimators are biased for finite samples with greater biases than for the proposed method, although these decrease asymptotically. The estimations for γ and σ 2 remain unaffected if the principal components are chosen correctly. Finally, when keeping fixed the number of principal components, the biases increase as the linear relationship among variables decreases.

Table 1 ρ = 0.

Source: author's elaboration.

Table 2 ρ = 0.

Source: author's elaboration.

Table 3 ρ = 0.

Source: author's elaboration.

III. Application

To see how the proposed solution behaves with real data, we use the production data of the agricultural and livestock sector with a sample of n = 23 livestock farms. The output variable is the total income, and inputs are labor, capital and other inputs; all have been measured in nominal Colombian −COL− pesos.

Then, a stochastic frontier production model was fitted assuming a Cobb-Douglas functional form with normal-exponential specification, Estimations were carried out us ing the LIMited DEPendent −LIMDEP− econometric software (version 10). As can be seen in Table 4 the only statistically significant parameter is the input corresponding to log(Other inputs2). Although the variable log(Capital) is insignificant, its estimated coefficient has an unexpected opposite sign, indicating a signal of possible multicollinearity.

Table 4 Estimated Stochastic Frontier Production Function 

Source: author's elaboration.

To detect multicollinearity, we computed the scaled condition in dexes. Table 5 shows there are two harmful condition indexes (with values greater than 30), indicating two possible near-linear dependencies among inputs. Thus, under the multicollinearity problem, we applied the proposed principal-component-based solution. The proportion of vari ance explained by the first principal component was 88.6%. Therefore, we applied the solution using this principal component. Table 6 displays the corresponding results. Based on these results, the estimates of the principal-component-based stochastic frontier using the equation (4) are in Table 7. Results show that all inputs are statistically significant with correct signs in accordance to production theory.

Table 5 Condition Indexes 

Condition Index
1.000
12.829
42.981
101.730

Source: author's elaboration.

Table 6 Estimated Principal Component Model 

Source: author's elaboration.

Table 7 Estimated Principal-Component-Based Stochastic Frontier Model 

Source: author's elaboration.

Conclusions

Based on simulation results, the estimators for inputs obtained under the proposed principal-component-based solution are biased, and such biases do not decrease asymptotically. Besides, the estimators have less MSE with respect to the usual ones even in large samples. For finite sam ples, the estimators are biased, and seem to have greater biases than the principal-component-based estimators. Also, the bias diminishes when the sample size increases. If the principal components are correct, the estimation of remains are correct, the proposed method. Furthermore, when keeping fixed the number of prin cipal components, the biases of the proposed estimator increase as the linear relation between covariates decreases. The choice of the number of principal components is critical to the estimation of β, γ and σ2, as well as for the efficiency component. After applying the proposed method on real data from the agricultural and livestock sectors to evaluate its tech

nical inefficiency, our method seems to provide better estimation results for the coefficients, as well as for the scale returns, in comparison with the traditional method.

References

Aigner, Dennis; Lovell, Knox & Schmidt, Peater (1977). "Formulation and estimation of stochastic frontier production function models", Journal of Econometrics, Vol. 6, Issue 1, pp. 21-37. [ Links ]

Belsley, David; Kuh, Edwin & Welsh, Roy (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons, Inc. [ Links ]

Coelli, Timothy & Henningsen, Arne (2013). Frontier: Stochastic Frontier Analysis. Retrieved from: Retrieved from: http://CRAN.R-Project.org/package=frontier . R package version 1.1-0. (Accessed on July 2014). [ Links ]

Coelli, Timothy; Rao, Prasada D.S.; O'Donnell, Christopher J. & Bat tese, George E. (2005). An Introduction to Efficiency and Productivity Analysis (2nd. Ed.). New York: Springer. [ Links ]

Filippini, Massimo; Hrovatin, Nevenka & Zoric, Jelena (2008). "Cost effi ciency of slovenian water distribution utilities: an application of stochas tic frontier methods", Journal of Productivity Analysis, Vol. 29. Issue 2, pp. 169-182. [ Links ]

Fomby, Thomas B.; Johnson, Stanley R. & Hill, Carter (1984Advanced Econometric Methods New York: Springer . [ Links ]

Greene, William (1980a). "Maximum likelihood estimation of econometric frontier functions", Journal of Econometrics, Vol. 13, Issue 1, pp. 27-56. [ Links ]

Greene, William (1980b). "On the estimation of a flexible frontier produc tion model", Journal of Econometrics , Vol. 13, Issue 1, pp. 101-115. [ Links ]

Greene, William (2008). "The econometric approach to efficiency analysis". In: Fried, Harold; Lovell, Knox & Schmidt, Shelton (Eds.), The Mea surement of Productive Efficiency and Productivity Growth (pp. 92-150). New York, Oxford University Press. [ Links ]

Groß, Jürgen (2003). "Linear Regression", Lecture Notes in Statistics, Vol. 175. Springer. [ Links ]

Hwang, Gene J. T. & Nettleton, Dan (2003). "Principal components re gression with data chosen components and related methods", Techno metrics, Vol. 45, No. 1, pp. 70-79. [ Links ]

Jolliffe, Ian T. (1982). "A note on the use of principal components in regres sion", Journal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 31, No. 3, pp. 300-303. [ Links ]

Jolliffe, Ian T. (2002). Principal Component Analysis (2nd Ed.). New York: Springer . [ Links ]

Kumbhakar, Subal C. & Lovell, C. Knox (2000). Stochastic Frontier Analysis. Cambridge: Cambridge University Press. [ Links ]

Mason, Robert & Gunst, Richard (1985). "Selecting principal components in regression", Statistics and Probability Letters, Vol. 3, Issue 6, pp. 299 -301. [ Links ]

Massy, William F. (1965). "Principal components regression in exploratory statistical research", Journal of the American Statistical Association, Vol. 60, Issue 309, pp. 234-256. [ Links ]

Meeusen, Wim & van Den Broeck, Julien (1977). "Efficiency estimation from Cobb-Douglas production functions with composed error", Inter national Economic Review, Vol . 18, No. 2, pp. 435-444. [ Links ]

Puig-Junoy, Jaume (2001). "Technical inefficiency and public capital in U.S. states: A stochastic frontier approach", Journal of Regional Science, Vol . 41, Issue 1, pp. 75-96. [ Links ]

Stevenson, Rodney (1980). "Likelihood functions for generalized stochastic frontier estimation", Journal of Econometrics , Vol . 13, Issue 1, pp. 58-66. [ Links ]

Received: July 08, 2015; Accepted: May 03, 2016

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License