SciELO - Scientific Electronic Library Online

 
vol.43 issue1Relationship Between Kendall's tau Correlation and Mutual InformationSpatial MCUSUM Control Chart author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Colombiana de Estadística

Print version ISSN 0120-1751

Rev.Colomb.Estad. vol.43 no.1 Bogotá Jan./June 2020  Epub June 05, 2020

https://doi.org/10.15446/rce.v43n1.77052 

ARTÍCULOS ORIGINALES DE INVESTIGACIÓN

Two Useful Discrete Distributions to Model Overdispersed Count Data

Dos distribuciones discretas útiles para modelar datos de recuento sobredispersos

Josmar Mazucheli1 

Wesley Bertoli2 

Ricardo Oliveira3 

1 Department of Statistics, State University of Maringá, Maringá, Brazil. PhD. E-mail: jmazucheli@gmail.com

2Department of Statistics, Federal University of Technology - Paraná, Curitiba, Brazil. PhD. E-mail: wbsilva@utfpr.edu.br

3 Medical School of Ribeirão Preto, University of São Paulo, Ribeirão Preto, Brazil. PhD. E-mail: rpuziol.oliveira@gmail.com


Abstract

The methods to obtain discrete analogs of continuous distributions have been widely considered in recent years. In general, the discretization process provides probability mass functions that can be competitive with the tra ditional model used in the analysis of count data, the Poisson distribution. The discretization procedure also avoids the use of continuous distribution in the analysis of strictly discrete data. In this paper, we seek to introduce two discrete analogs for the Shanker distribution using the method of the in finite series and the method based on the survival function as alternatives to model overdispersed datasets. Despite the difference between discretization methods, the resulting distributions are interchangeable. However, the dis tribution generated by the method of the infinite series method has simpler mathematical expressions for the shape, the generating functions, and the central moments. The maximum likelihood theory is considered for estima tion and asymptotic inference concerns. A simulation study is carried out in order to evaluate some frequentist properties of the developed methodology. The usefulness of the proposed models is evaluated using real datasets pro vided by the literature.

Key words: Maximum likelihood estimation; Discrete distributions; Monte Carlo simulation; Overdispersion; Shanker distribution

Resumen

Los métodos para obtener análogos discretos de distribuciones continuas han sido ampliamente considerados en los últimos años. En general, el pro ceso de discretización proporciona funciones de probabilidad en masa que pueden ser competitivas con el modelo tradicional utilizado en el análisis de datos de conteo, la distribución de Poisson. El procedimiento de discretización también evita el uso de la distribución continua en el análisis de datos estrictamente discretos. En este artículo, intentamos introducir dos análogos discretos para la distribución de Shanker utilizando el método de la serie infinita y el método basado en la función de supervivencia como al ternativas para modelar conjuntos de datos sobre dispersados. A pesar de la diferencia entre los métodos de discretización, las distribuciones resultantes son intercambiables. Sin embargo, la distribución generada por el método de series infinitas tiene expresiones matemáticas más simples para la forma, las funciones de generación y los momentos centrales. La teoría de máxi ma verosimilitud se considera para la estimación y las preocupaciones de inferencia asintótica. Se lleva a cabo un estudio de simulación para evaluar algunas propiedades frecuentistas de la metodología desarrollada. La utili dad de los modelos propuestos se evalúa utilizando conjuntos de datos reales proporcionados por la literatura.

Palabras clave: Estimación de máxima verosimilitud; Distribuciones disc retas; Distribución de Shanker; Simulación del Monte Carlo; Sobredispersión

Full text available only in PDF format.

References

Bateman, H. & Erdélyi, A. (1953), Higher transcendental functions, Vol. 2, McGraw-Hill, New York. [ Links ]

Bi, Z., Faloutsos, C. & Korn, F. (2001), The DGX distribution for mining massive, skewed data, in 'Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining', ACM, pp. 17-26. [ Links ]

Bliss, C. I. & Fisher, R. A. (1953), 'Fitting the negative binomial distribution to biological data', Biometrics 9(2), 176-200. [ Links ]

Bracquemond, C. & Gaudoin, O. (2003), 'A survey on discrete lifetime distribu tions', International Journal of Reliability, Quality and Safety Engineering 10(1), 69-98. [ Links ]

Chakraborty, S. (2015a), 'Generating discrete analogues of continuous probability distributions - A survey of methods and constructions', Journal ofStatistical Distributions and Applications 2(1), 1-30. [ Links ]

Chakraborty, S. (2015b), 'A new discrete distribution related to generalized Gamma distribution and its properties', Communications in Statistics - The ory and Methods 44(8), 1691-1705. [ Links ]

Chakraborty, S. & Chakravarty, D. (2012), 'Discrete Gamma distributions: Prop erties and parameter estimation', Communications in Statistics - Theory and Methods 41(18), 3301-3324. [ Links ]

Chakraborty, S. & Chakravarty, D. (2016), 'A new discrete probability distribution with integer support on (-oo, +oo)', Communications in Statistics - Theory and Methods 45(2), 492-505. [ Links ]

Chakraborty, S. & Gupta, R. D. (2015), 'Exponentiated Geometric distribution: Another generalization of Geometric distribution', Communications in Statis tics - Theory and Methods 44(6), 1143-1157. [ Links ]

Collett, D. (2003), Modelling survival data in medical research, 2 edn, Chapman and Hall, New York. [ Links ]

Doornik, J. A. (2007), Object-oriented matrix programming using Ox, 3 edn, Lon don: Timberlake Consultants Press and Oxford. [ Links ]

Doray, L. G. & Luong, A. (1997), 'Efficient estimators for the Good family', Com munications in Statistics - Simulation and Computation 26(3), 1075-1088. [ Links ]

Ghitany, M. E., Atieh, B. & Nadarajah, S. (2008), 'Lindley distribution and its application', Mathematics and Computers in Simulation 78(4), 493-506. [ Links ]

Gómez-Déniz, E. & Calderín-Ojeda, E. (2011), 'The discrete Lindley distribution: Properties and applications', Journal of Statistical Computation and Simula tion 81(11), 1405-1416. [ Links ]

Good, I. J. (1953), 'The population frequencies of species and the estimation of population parameters', Biometrika 40(3-4), 237-264. [ Links ]

Grandell, J. (1997), Mixed Poisson processes, Vol. 77, Chapman and Hall/CRC. [ Links ]

Haight, F. A. (1957), 'Queueing with balking', Biometrika 44(3-4), 360-369. [ Links ]

Hamada, M. S., Wilson, A. G., Reese, C. S. & Martz, H. F. (2008), Bayesian reliability, Springer Series in Statistics, Springer, New York. [ Links ]

Hussain, T. & Ahmad, M. (2014), 'Discrete inverse Rayleigh distribution', Pakistan Journal of Statistics 30(2), 203-222. [ Links ]

Inusah, S. & Kozubowski, T. J. (2006), 'A discrete analogue of the Laplace distri bution', Journal of Statistical Planning and Inference 136(3), 1090-1102. [ Links ]

Jazi, M. A., Lai, C. D. & Alamatsaz, M. H. (2010), 'A discrete inverse Weibull dis tribution and estimation of its parameters', Statistical Methodology 7(2), 121 -132. [ Links ]

Kalbfleisch, J. D. & Prentice, R. L. (2002), The statistical analysis of failure time data, 2 edn, Wiley, New York. [ Links ]

Keilson, J. & Gerber, H. (1971), 'Some results for discrete unimodality', Journal of the American Statistical Association 66(334), 386-389. [ Links ]

Kemp, A. W. (1997), 'Characterizations of a discrete Normal distribution', Journal of Statistical Planning and Inference 63(2), 223-229. [ Links ]

Kemp, A. W. (2004), 'Classes of discrete lifetime distributions', Communications in Statistics - Theory and Methods 33(12), 3069-3093. [ Links ]

Kemp, A. W. (2008), The discrete Half-Normal distribution, Birkhäuser Boston, Boston, pp. 353-360. In Advances in Mathematical and Statistical Modeling. [ Links ]

Kennan, J. (1985), 'The duration of contract strikes in U.S. manufacturing', Jour nal of Econometrics 28(1), 5-28. [ Links ]

Klein, J. P. & Moeschberger, M. L. (1997), Survival analysis: Techniques for censored and truncated data, Springer-Verlag, New York. [ Links ]

Kozubowski, T. J. & Inusah, S. (2006), 'A skew Laplace distribution on integers', Annals of the Institute of Statistical Mathematics 58(3), 555-571. [ Links ]

Krishna, H. & Pundir, P. S. (2009), 'Discrete Burr and discrete Pareto distribu tions', Statistical Methodology 6(2), 177-188. [ Links ]

Kulasekera, K. B. & Tonkyn, D. W. (1992), 'A new discrete distribution, with ap plications to survival, dispersal and dispersion, Communications in Statistics - Simulation and Computation 21(2), 499-518. [ Links ]

Lawless, J. F. (2003), Statistical models and methods for lifetime data, 2 edn, John Wiley & Sons, Hoboken, New York. [ Links ]

Lee, E. T. & Wang, J. W. (2003), Statistical methods for survival data analysis, 3 edn, John Wiley & Sons, Hoboken, New York. [ Links ]

Meeker, W. Q. & Escobar, L. A. (1998), Statistical methods for reliability data, John Wiley & Sons, New York. [ Links ]

Nakagawa, T. & Osaki, S. (1975), 'The discrete Weibull distribution', IEEE Trans actions on Reliability R-24(5), 300-301. [ Links ]

Nekoukhou, V., Alamatsaz, M. H. & Bidram, H. (2012), 'A discrete analog of the Generalized Exponential distribution, Communication in Statistics - Theory and Methods 41(11), 2000-2013. [ Links ]

Nekoukhou, V., Alamatsaz, M. H. & Bidram, H. (2013), 'Discrete generalized Ex ponential distribution of a second type, Statistics - A Journal of Theoretical and Applied Statistics 47(4), 876-887. [ Links ]

R Development Core Team (2017), R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org. [ Links ]

Roy, D. (2003), 'The discrete Normal distribution, Communication in Statistics -Theory and Methods 32(10), 1871-1883. [ Links ]

Roy, D. (2004), 'Discrete Rayleigh distribution, IEEE Transactions on Reliability 53(2), 255-260. [ Links ]

Rubinstein, R. Y. & Kroese, D. P. (2008), Simulation and the Monte Carlo method, Wiley Series in Probability and Statistics, 2 edn, John Wiley & Sons, Hoboken, New York. [ Links ]

Saha, K. K. (2008), 'Analysis of one-way layout of count data in the presence of over or under dispersion, Journal of Statistical Planning and Inference 138(7), 2067-2081. [ Links ]

Sato, H., Ikota, M., Sugimoto, A. & Masuda, H. (1999), 'A new defect distribution metrology with a consistent discrete exponential formula and its applications, IEEE Transactions on Semiconductor Manufacturing 12(4), 409-418. [ Links ]

Shanker, R. (2015), 'Shanker distribution and its applications, International Jour nal of Statistics and Applications 5(6), 338-348. [ Links ]

Siromoney, G. (1964), 'The general Dirichlets Series distribution', Journal of the Indian Statistical Association 2-3(2), 1-7. [ Links ]

Slater, L. J. (1966), Generalized hypergeometric functions, Cambridge University Press, London. [ Links ]

Vuong, Q. H. (1989), 'Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica 57(2), 307-333. [ Links ]

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License