Validation of models with proportional bias

Medina-Peralta, Salvador; Vargas-Villamil, Luis; Colorado-Martínez, Luis; Navarro-Alberto, Jorge; Medina-Peralta, Salvador; Vargas-Villamil, Luis; Colorado-Martínez, Luis; Navarro-Alberto, Jorge

doi:10.21897/rmvz.927

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Revista MVZ Córdoba

Print version ISSN 0122-0268

Rev.MVZ Cordoba vol.22 no.1 Córdoba Jan./Apr. 2017

https://doi.org/10.21897/rmvz.927

Originales

Validation of models with proportional bias

Validación de modelos con sesgo proporcional

Salvador Medina-Peralta¹

Luis Vargas-Villamil²^*

Luis Colorado-Martínez¹

Jorge Navarro-Alberto³

^¹Universidad Autónoma de Yucatán, Facultad de Matemáticas, Apartado Postal 172, C.P. 97119, Mérida, Yucatán, México.

^²Colegio de Postgraduados, Campus Tabasco, Periférico Carlos A. Molina Km. 3.5, Apartado Postal 24, C.P. 86500, Cárdenas, Tabasco, México.

^³Universidad Autónoma de Yucatán, Facultad de Medicina Veterinaria y Zootecnia, Apartado Postal 4-116 Itzimná, C.P. 97100, Mérida, Yucatán, México.

ABSTRACT

Objective.

This paper presents extensions to Freese’s statistical method for model-validation when proportional bias (PB) is present in the predictions. The method is illustrated with data from a model that simulates grassland growth.

Materials and methods.

The extensions to validate models with PB were: the maximum anticipated error for the original proposal, hypothesis testing, and the maximum anticipated error for the alternative proposal, and the confidence interval for a quantile of error distribution.

Results.

The tested model had PB, which once removed, and with a confidence level of 95%, the magnitude of error does not surpass 1225.564 kg ha^-1. Therefore, the validated model can be used to predict grassland growth. However, it would require a fit of its structure based on the presence of PB.

Conclusions.

The extensions presented to validate models with PB are applied without modification in the model structure. Once PB is corrected, the confidence interval for the quantile 1-α of the error distribution enables a higher bound for the magnitude of the prediction error and it can be used to evaluate the evolution of the model for a system prediction.

Keywords: Validity; models; bias; errors; confidence intervals (Source: CAB tesauro, MeSH).

RESUMEN

Objetivo.

En este trabajo se presentan extensiones al método estadístico de Freese para validar modelos con sesgo proporcional (SP) en sus predicciones y se ilustra el método con datos provenientes de un modelo de simulación de crecimiento de praderas.

Materiales y métodos.

Las extensiones para validar un modelo con SP fueron: el error máximo anticipado para el planteamiento original, la prueba de hipótesis y error máximo anticipado para el planteamiento alternativo, y el intervalo de confianza para un cuantil de la distribución de los errores.

Resultados.

El modelo evaluado presentó SP, una vez removido y con un nivel de confianza del 95% la magnitud del error no sobrepasa 1225.564 kg ha-1. Por lo que el modelo validado podría usarse para predecir el crecimiento de praderas, sin embargo, requerirá un ajuste en su estructura con base a la presencia de SP.

Conclusiones.

Las extensiones presentadas para validar modelos en presencia de SP se aplican sin que el modelo sea modificado en su estructura. El intervalo de confianza para el cuantil 1-α de la distribución de los errores una vez que se corrige el SP, permite determinar una cota superior para la magnitud del error de predicción y usarla para evaluar la evolución del modelo en predicción del sistema.

Palabras clave: Errores; intervalos de confianza; modelos; sesgo; validez (Fuentes: CAB tesauro, MeSH).

INTRODUCTION

Validation of a system prediction model consists of comparing the predictions of the model with observed values of the actual system to determine its predictive capacity by means of some method. In this stage of the mathematical modeling process, accuracy and precision of the model are assessed. Accuracy refers to the proximity of the predictions (z) to observed values (y), for example, their differences (d=y-z) from zero. Precision refers to dispersion of the points (z, y). However, in the presence of accuracy, precision is measured by quantifying the dispersion of the points with respect to a reference, for example, the deterministic line y=z, or by evaluating the variances of the differences () around zero ().

In the literature, different techniques for validating models designed for purposes of prediction have been proposed; see Tedeschi ¹: linear regression analysis, fitted error analysis, concordance correlation coefficient, diverse measures of deviation, mean square error of prediction, non-parametric analyses, and comparing the distribution of observed and predicted data. Medina-Peralta et al ² point out that some deviation measures to validate models contradict graphic methods when the model predictions are biased. They recommend the joint use of deviation measurement and graphic methods for model validation.

Among the statistical inference techniques dealing with validating models with no bias (NB), and with constant (CB), or proportional bias (PB) in their predictions, the procedure given by Freese ³ stands out. This method consists in determining whether accuracy and precision of a prediction model, or technique meets the requirements of the model developer or user. Medina et al ⁴ extended the Freese method when the model has CB in its predictions: hypothesis tests and maximum anticipated error for the alternative proposal, and the confidence interval for a quantile of the error distribution.

This paper presents extensions to the statistical method for model validation proposed by Freese ³, Rennie and Wiant ⁵, Reynolds ⁶, Barrales et al ⁷, and Medina et al ⁴ when the model has PB in its predictions: 1) maximum anticipated error for the original proposal, 2) hypothesis testing, and maximum anticipated error for the alternative proposal, and 3) the interval of confidence for a quantile of error distribution. The method is illustrated with published data of Barrales et al ⁷ corresponding to a model that simulates grassland growth.

MATERIALS AND METHODS

Basic concepts. Assume that we have “n” pairs to compare (_{^{yi, zi}} ) i=1,2,…,n, where for the i^th pair, _^yi is the observed value, _^zi the corresponding predicted value for the deterministic model to be validated, and _^di=yi-zi the difference between the two values. In the development of extensions of the method, the observed values are considered as realizations of random variables _^Yi , and the predicted values are deterministic, thus _^Di=Yi-zi .

In Freese’s ³ approach applied to determining the required accuracy and precision, the “e” values (maximum admitted error of the deviations |y_i-z_i|=|d_i|) and α (1-α represents the required certainty) specified by the model developer or user, it is needed that D be normally distributed with zero mean and , in order to accept the model and consider it as sufficiently reliable for system prediction. Therefore, a model is exact, or with no bias (NB), when the differences (di) fit a normal distribution with mean zero.

Medina, Vargas-Villamil (⁴) indicate that CB is recognized by an average value of the differences () distant from zero and that the graph of the points (zi, di=yi-zi) forms a horizontal band centered at with systematic distribution either positive or negative (points above and below the line ). When the samples are related, a t-test applied to determine if the mean of the differences is significantly different from zero, would prove the presence of CB in the model predictions.

Determination of proportional bias. PB is recognized in the graph of points (zi, di=yi-zi) whenever a positive or negative linear trend is found; the magnitude of the bias (di=yi-zi) increases or decreases in direct proportion to the predicted values (zi). A simple linear regression analysis of the bias vs predicted () contributes to the detection of PB more objectively (²,⁸).

Validation of models with proportional bias. Given that we have PB, the points (zi, Di) i=1,2,…,n are related by and , with Di~. That is, , where , , and ~, and thus, the model will be accurate when carrying out the correction . In practice, a regression model is fit and the errors (εi): are estimated, where a and b are the least squares estimates of the intercept (β₀) and the slope (β₁), respectively. Thus, we would need only a statistical test for the required precision , which is translated to , once PB is removed by the referred correction.

If ~, then . Moreover, for e and α satisfying , it follows that ; thus, and where, in general, represents the quantile ( of the chi-square distribution with “k” degrees of freedom (); that is, .

The indicated correction results in an accurate model, and only the statistical test for precision is needed. Thus, the following step is to test the hypotheses with the original proposal (OP) vs (3), or the alternative proposal (AP) vs (8). The test statistic with ~ and under true null hypotheses or is:

or is rejected with a significance level α’ if or , where Vc corresponds to the calculated value of the test statistic (V), and (CME)D to the mean square of the error for the estimated model . Therefore, if is not rejected or is rejected, then the model is considered acceptable for prediction under OP and AP, respectively.

Another approach for evaluating precision once PB is corrected, is to use confidence intervals (CI), similar to the procedure presented by Medina et al ⁴ when the model has CB in its predictions. This approach is motivated by Freese’s proposal: different users of the model may have different requirements of precision, leading to different values of the maximum admitted error of the deviations (e).

From the rejection region for , and solving for “e”, the critical error is found ⁸:

Therefore, will be rejected if , and it will not be rejected if ; that is, if the model user specifies a value for “e” such that , then the model is considered acceptable in the prediction of the system under OP.

Using an analogous procedure for AP, the maximum anticipated error, or critical error was obtained (⁸):

Thus, if the model user specifies a value of “e” such that , then the model is considered acceptable in predicting the system under AP. In a previous study (⁹), a hypothesis tests and confidence intervals are related, this relationship enables to construct one from the other.

Other published works from Reynolds (⁶) and Barrales et al (⁷) do not present the confidence interval (CI) for , the quantile 1-α of the distribution of (in practice, the errors (e) once PB is corrected) or equivalently , is the quantile 1-α of the distribution of since, if ~, then .

A (1-α’)100% CI for ⁸ based on CI (1-α’)100% for (where is an increasing monotonic function) is given by

where and correspond to the critical errors and , with the difference that α’ is substituted by α’/2. The estimated CI means that we have a confidence of 100(1-α’)% that the value of the distribution for having below the 100(1-α) percent of the absolute errors is located in some location of the referred interval. This enables us to determine with certain probability an upper bound for the magnitude of the prediction error (e), and to use it in the evaluation of the evolution of the model for system prediction.

RESULTS

To illustrate the application of the methodology, the CI approach ⁴ was used with a published data set from Barrales et al ⁷, which corresponds to a model of grassland growth simulation. In this example, we applied the method of calculating the maximum anticipated errors or critical errors of the OP and AP, when the model has PB. In addition, 95% CI was calculated for , the 1-α quantile of the error (e) distribution, once PB is corrected with .

The distribution of the points in Figure 1 indicates that the model has PB in its predictions; the magnitude of the bias (di=yi-zi) decreases directly with the predicted values (zi) and the linear relationship was significant (d=912.838-0.396z: F=13.665, p=0.001). With α=α’=0.05, the critical errors were and . Therefore, if the modeler or the model user specifies a value “e” such that or , then the model is considered sufficiently reliable in predicting the system based on OP and AP, respectively. For example, if e=1157 kg ha-1, the model is considered acceptable for predicting with both proposals (OP: e>697.311 kg ha-1; AP: e>1156.273 kg ha-1).

The 1-α’=95% CI for is , and thus, we are 95% confident that the distribution value found below 1-α=95% of the absolute errors is located in some part of the interval (669.689 kg ha^-1 - 1225.564 kg ha^-1); that is, there is a 95% of probability that the magnitude of the prediction error does not surpass 1225.564 kg ha^-1.

With 95% of confidence, the critical error with AP and CI for would have to allow 1156.273 kg ha^-1 as the minimum prediction error, which should not surpass 1225.564 kg ha^-1. In this way, the validated model could be used to predict grassland growth. However, it would require a fit in its structure based on the presence PB in its predictions.

Figure 1 Relationship between bias (d_i) and simulated values (z_i) for the grassland growth data.

DISCUSSION

Identifying the type of bias may allow to improve the structure of the model and its evaluation, the data and methods used in all the processes of its construction, and validation ⁴. McCarthy et al have pointed out that testing a model helps to identify its weaknesses so that its predictive performance can be improved ¹⁰, since identification, and acceptance of inaccuracies of a model is a step towards the evolution of a more accurate, more reliable model ¹.

The extensions we present for validating deterministic model possessing PB in their predictions are applied with no modifications in the model structure because this type of bias is removed from the available data (_{^{zi, yi}} ) by means of the bias values (d_i), namely, by the correction .

The test statistic and critical error with AP require a larger maximum admissible error (e) value than that with OP, in order to infer whether the model is acceptable for predicting the system or not. A previous study by Medina et al ⁴ points out that the inconvenience of applying the OP is the ambiguity occurring when the null hypothesis is not rejected (H₀) since what it can be inferred is that the data do not provide sufficient evidence to reject it, and that the affirmation stated in H₀ is not accepted. Moreover, as the research hypothesis is posed in the alternative hypothesis, we recommend using the AP in validating a system prediction model.

Validation by means of critical errors or by the confidence limits approach (the latter being a procedure equivalent to the hypothesis testing approach, as mentioned above), is reduced to calculating the maximum anticipated error, or critical error, in which the model developer or the user decide whether the model is acceptable in predicting the system. This is carried out by comparing the critical error with the required accuracy (e) under the values α and α’ specified beforehand. This requires that the model developer or the user have good understanding of the system to establish, a priori, the maximum admissible error (e). Medina et al ⁴ recommend the CI approach to validate a model for several reasons. It provides the range of possible values of the parameter under consideration thus, it is more informative than hypothesis testing in the sense that it permits determining the a posteriori maximum admissible prediction error.

It should be pointed out that the critical error, or a posteriori maximum admissible error, can be used to compare several models for a single system, in such a way that the best model would be the one with the lowest critical error. To assess the improvement of a model in system prediction it is necessary to observe whether a posteriori maximum admissible error decreases at each process of improvement.

The estimated CI for the quantile 1-α of the error distribution, once PB is corrected, allows us to determine an upper bound for the magnitude of the prediction error with a certain probability, and to use it in the evaluation of the evolving model whenever an improvement is required. This is similar to the case of validating a model with CB in its predictions ⁴. Thus, based on the critical error with AP and the aforementioned CI, we can establish the minimum permitted prediction error, and what amount of error of prediction would not surpass.

In conclusion, the extensions to Freese’s statistical method presented here to validate models in the presence of proportional bias are applied without modification in the model structure.

In validating a model, we recommend using the confidence interval approach under the alternative proposal.

The confidence interval for the 1-α quantile of error distribution, once the proportional bias is corrected, allows determination of an upper bound for the magnitude of the prediction error.

Both a posteriori maximum admissible error, and the upper bound for the magnitude of the prediction error can be used to evaluate the evolution of a model in predicting the system in the face of a modification of the modeling process.

Acknowledgements

We thank the Subsecretaría de Educación Superior e Investigación Científica, Programa de Mejoramiento del Profesorado (PROMEP) Secretaría de Educación Pública of México for funding this study.

REFERENCES

1. Tedeschi LO. Assessment of the adequacy of mathematical models. Agr Syst 2006; 89(2-39):225-247. [ Links ]

2. Medina-Peralta S, Vargas-Villamil L, Navarro-Alberto J, Canul-Pech C, Peraza-Romero S. Comparación de medidas de desviación para validar modelos sin sesgo, sesgo constante o proporcional. Univ Cienc. 2010; 26(3):255-263. [ Links ]

3. Freese F. Testing accuracy. For Sci 1960; 6:139-45. [ Links ]

4. Medina PS, Vargas-Villamil L, Navarro AJ, Avendaño L, Colorado L, Arjona-Suarez E, Mendoza-Martínez G. Validación de modelos con sesgo constante: un enfoque aplicado. Rev MVZ Córdoba 2014; 19(2):4099-08. [ Links ]

5. Rennie JC, Wiant HV. Modification of Freese’s chi-square test of accuracy. USDI Bureau of Land Management, Denver Colorado. Resource Inventory 1978; 14:1-3. [ Links ]

6. Reynolds MR. Estimating the error in model predictions. For Sci 1984; 30(2):454-69. [ Links ]

7. Barrales VL, Peña RI, Fernández RB. Model validation: an applied approach. Agric Tech 2004; 64:66-73. [ Links ]

8. Medina PS. Validación de modelos mecanísticos basada en la prueba ji-cuadrada de Freese, su modificación y extensión. [Master’s thesis]. Montecillo, México: Colegio de Postgraduados; 2006. [ Links ]

9. Bickel PJ, Doksum KA. Mathematical Statistic: basic ideas and selected topics Vol I. Second edition. New Jersey: Pearson Prentice Hall; 2007. [ Links ]

10. McCarthy MA, Possingham HP, Day JR, Tyre AJ. Testing the accuracy of population viability analysis. Conserv Biol 2001; 15:1030-38. [ Links ]

Received: May 2016; Accepted: November 2016

^* Correspondence: luis@avanzavet.com

This is an open-access article distributed under the terms of the Creative Commons Attribution License