SciELO - Scientific Electronic Library Online

 
vol.44 issue2Bayesian Modeling Competitions for the ClassroomBayesian Estimation of Morgenstern Type Bivariate Rayleigh Distribution using Some Types of Ranked Set Sampling author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Colombiana de Estadística

Print version ISSN 0120-1751

Rev.Colomb.Estad. vol.44 no.2 Bogotá July/Dec. 2021  Epub Aug 27, 2021

https://doi.org/10.15446/rce.v44n2.85606 

Original articles of research

Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence

Diagnósticos de influencia para modelos de regresión binomial correlacionada: una aplicación a un conjunto de datos sobre la ocurrencia de servicios de salud de alto costo

CARLOS DINIZ1  a 

RUBIANE PIRES1  b 

CAROLINA PARAÍBA2  c 

PAULO FERREIRA2  d 

1 Department of Statistics, Federal University of São Carlos, São Carlos, Brazil

2 Department of Statistics, Federal University of Bahia, Salvador, Brazil


Abstract

This paper considers a frequentist perspective to deal with the class of correlated binomial regression models (Pires & Diniz, 2012), thus providing a new approach to analyze correlated binary response variables. Model parameters are estimated by direct maximization of the log-likelihood function. We also consider a diagnostic analysis under the correlated binomial regression model setup, which is performed considering residuals based on predictive values and deviance residuals (Cook & Weisberg, 1982) to check for model assumptions, and global influence measure based on case-deletion (Cook, 1977) to detect influential observations. Moreover, a sensitivity analysis is carried out to detect possible influential observations that could affect the inferential results. This is done using local influence metrics (Cook, 1986) with case-weight, response, and covariate perturbation schemes. A simulation study is conducted to assess the frequentist properties of model parameter estimates and check the performance of the considered diagnostic metrics under the correlated binomial regression model. A data set on high-cost claims made to a private health care provider in Brazil is analyzed to illustrate the proposed methodology.

Key words: Generalized binomial distribution; Health care provider; Influence; Overdispersion; Regression; Residuals

Resumen

Este artículo considera una perspectiva frecuentista para tratar con la clase de modelos de regresión binomial correlacionada (Pires & Diniz, 2012), proporcionando así un nuevo enfoque para analizar variables de respuesta binaria correlacionadas. Los parámetros del modelo se estiman mediante la maximización directa de la función de log-verosimilitud. También consideramos un análisis de diagnóstico bajo la configuración del modelo de regresión binomial correlacionada, que se realiza considerando los residuos basados en valores predictivos y los residuos de desviación (Cook & Weisberg, 1982) para verificar los supuestos del modelo y la medida de influencia global basada en la eliminación de casos (Cook, 1977) para detectar observaciones influyentes. Además, se realiza un análisis de sensibilidad para detectar posibles observaciones influyentes que podrían afectar los resultados inferenciales. Esto se hace utilizando métricas de influencia local (Cook, 1986) con esquemas de perturbación de covariable, variable respuesta y ponderación de casos. Se realiza un estudio de simulación para evaluar las propiedades frecuentistas de los estimadores de parámetros del modelo y verificar el rendimiento de las métricas de diagnóstico consideradas bajo el modelo de regresión binomial correlacionada. Se analiza un conjunto de datos sobre un plan de salud de un operador brasileño para ilustrar la metodología propuesta.

Palabras clave: Distribución binomial generalizada; Plan de salud; Influencia; Sobredispersión; Regresión; Residuos

Full text available only in PDF format

References

Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley Series in Probability and Statistics, first edn, Wiley, New Jersey. [ Links ]

Akaike, H. (1974), 'A new look at the statistical model identification', IEEE Transactions on Automatic Control 19(6), 716-723. [ Links ]

Altham, P. M. E. (1978), 'Two generalizations of the binomial distribution', Journal of the Royal Statistical Society. Series C 27(2), 162-167. [ Links ]

Cook, R. D. (1977), 'Detection of influential observations in linear regression', Technometrics 19(1), 15-18. [ Links ]

Cook, R. D. (1986), 'Assessment of local influence', Journal of the Royal Statistical Society. Series B (Methodological) 48(2), 133-169. [ Links ]

Cook, R. & Weisberg, S. (1982), Residuals and influence in regression, Monographs on statistics and applied probability, Chapman and Hall, London. [ Links ]

Diniz, C. A. R., Tutia, M. H. & Leite, J. G. (2010), 'Bayesian analysis of a correlated binomial model', Brazilian Journal of Probability and Statistics 24(1), 68-77. [ Links ]

Efron, B. (1986), 'Double exponential families and their use in generalized linear regression', Journal of the American Statistical Association 81(395), 709-721. [ Links ]

Fu, J. & Sproule, R. (1995), 'A generalization of the binomial distribution', Communications in Statistics - Theory and Methods 24(10), 2645-2658. [ Links ]

Lambert, D. (1992), 'Zero-inflated poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1-14. [ Links ]

Lehmann, E. L. & Casella, G. (1998), Theory of point estimation, second edn, Springer, New York. [ Links ]

Luceño, A. (1995), 'A family of partially correlated poisson models for overdispersion', Computational Statistics and Data Analysis 20(5), 511-520. [ Links ]

McCullagh, P. & Nelder, J. A. (1989), Generalized Linear Models, second edn, Chapman and Hall, London . [ Links ]

Nocedal, J. & Wright, S. J. (2006), Numerial Optimization, second edn, Springer- Verlag, New York. [ Links ]

Pires, R. M. & Diniz, C. A. R. (2012), 'Correlated binomial regression models', Computational Statistics and Data Analysis 56(8), 2513-2525. [ Links ]

Prentice, R. L. (1986), 'Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors', Journal of the American Statistical Association 81(394), 321-327. [ Links ]

R Development Core Team (2007), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.orgLinks ]

Schwarz, G. (1978), 'Estimating the dimension of a model', Annals of Statistics 6(2), 461-464. [ Links ]

She, Y. & Owen, A. B. (2011), 'Outlier detection using nonconvex penalized regression', Journal of the American Statistical Association 106(494), 626-639. [ Links ]

Sherman, M. (2011), Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties, Wiley Series in Probability and Statistics, John Wiley and Sons. [ Links ]

Skellam, J. G. (1948), 'A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials', Journal of the Royal Statistical Society, Series B 10(2), 257-261. [ Links ]

Zhu, H., Lee, S.-Y., Wei, B.-C. & Zhou, J. (2001), 'Case-deletion measures for models with incomplete data', Biometrika 88(3), 727-737. [ Links ]

Received: March 2020; Accepted: March 2021

a Ph.D. E-mail: dcad@ufscar.br

b Ph.D. E-mail: rubianemariapires@gmail.com

c Ph.D. E-mail: carolina.paraiba@ufba.br

d Ph.D. E-mail: paulohenri@ufba.br

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License