SciELO - Scientific Electronic Library Online

 
vol.43 issue1Two Useful Discrete Distributions to Model Overdispersed Count Data author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Colombiana de Estadística

Print version ISSN 0120-1751

Rev.Colomb.Estad. vol.43 no.1 Bogotá Jan./June 2020  Epub Feb 05, 2020

https://doi.org/10.15446/rce.v43n1.78054 

ARTÍCULOS ORIGINALES DE INVESTIGACIÓN

Relationship Between Kendall's tau Correlation and Mutual Information

Relación entre la correlación tau de Kendall e información mutua

Mohammad Bolbolian Ghalibafa 

a Department of Statistics, Faculty of Mathematics and Computer Science, Hakim Sabzevari Univercity, Sabzevar, Iran. PhD. E-mail: m.bolbolian@hsu.ac.ir, m.bolbolian@gmail.com


Abstract

Mutual information (MI) can be viewed as a measure of multivariate association in a random vector. However, the estimation of MI is difficult since the estimation of the joint probability density function (PDF) of non-Gaussian distributed data is a hard problem. Copula function is an appropriate tool for estimating MI since the joint probability density function of random variables can be expressed as the product of the associated copula density function and marginal PDF's. With a little search, we find that the proposed copulas-based mutual information is much more accurate than conventional methods such as the joint histogram and Parzen window-based MI. In this paper, by using the copulas-based method, we compute MI for some family of bivariate distribution functions and study the relationship between Kendall's tau correlation and MI of bivariate distributions. Finally, using a real dataset, we illustrate the efficiency of this approach.

Key words: Copula function; Kendall's tau correlation; Mutual information

Resumen

La información mutua (MI) puede ser vista como una medida de asociación multivariante en un vector aleatorio. Sin embargo, la estimación de MI es difícil ya que la estimación de la función de densidad de probabilidad conjunta (PDF) de datos distribuidos no gaussianos es un problema difícil. La función copula es una herramienta apropiada para estimar el MI ya que la función de densidad de probabilidad de las variables aleatorias se puede expresar como el producto de la función de densidad de cópula asociada y de los PDF marginales. Con una pequeña búsqueda, encontramos que la información mutua propuesta basada en cópulas es mucho más precisa que los métodos convencionales, como el histograma de la articulación y el MI basado en ventana de Parzen. En este artículo, al utilizar el método basado en cópulas, calculamos el MI para algunas familias de funciones de distribución bivariadas y estudiamos la relación entre la correlación tau de Kendall y el MI de las distribuciones bivariadas. Finalmente, usando un conjunto de datos real, ilustramos la eficiencia de este enfoque.

Palabras clave: Función de cópula; Correlación tau de Kendall; Información mutua

Full text available only in PDF format.

References

Arellano-Valle, R. B., Contreras-Reyes, J. E. & Genton, M. G. (2013), 'Shannon Entropy and Mutual Information for Multivariate Skew-Elliptical Distributions', Scandinavian Journal of Statistics 40(1), 42-62. [ Links ]

Bell, C. B. (1962), 'Mutual information and maximal correlation as measures of dependence', The Annals of Mathematical Statistics 33(2), 587-595. [ Links ]

Blumentritt, T. & Schmid, F. (2012), 'Mutual information as a measure of multi-variate association: analytical properties and statistical estimation', Journal of Statistical Computation and Simulation 82(9), 1257-1274. [ Links ]

Calsaverini, R. S. & Vicente, R. (2009), 'An information-theoretic approach to statistical dependence: Copula information', EPL (Europhysics Letters) 88(6), 68003. [ Links ]

Clayton, D. G. (1978), 'A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence', Biometrika 65(1), 141-151. [ Links ]

Cook, R. D. & Johnson, M. E. (1981), 'A family of distributions for modeling non-elliptically symmetric multivariate data', Journal of the Royal Statistical Society, Series B 43(2), 210-218. [ Links ]

Cuadras, C. M. & Auge, J. (1981), 'A continuous general multivariate distribution and its properties', Communications in Statistics-Theory and Methods 10(4), 339-353. [ Links ]

Dobrowolski, E. & Kumar, P. (2014), 'Some properties of the Marshall-Olkin and generalized Cuadras-Auge families of copulas', Australian Journal of Mathematical Analysis and Applications 11(1), 1-13. [ Links ]

Fang, H. B., Fang, K. T. & Kotz, S. (2002), 'The meta-elliptical distributions with given marginals', Journal of Multivariate Analysis 82(1), 1-16. [ Links ]

Frank, M. J. (1979), 'On the simultaneous associativity of F(x,y) and x + y - F(x,y)', Aequationes Mathematicae 99(1), 194-226. [ Links ]

Genest, C. (1987), 'Frank s family of bivariate distributions, Biometrika 74(3), 145-159. [ Links ]

Genest, C. & MacKay, R. J. (1986a), 'Copules archimediennes et families de lois bidimensionnelles dont les marges sont données', Canadian Journal of Statistics 14(2), 145-159. [ Links ]

Genest, C. & MacKay, R. J. (1986b), 'The joy of copulas: bivariate distributions with uniform marginals, The American Statistician 40(4), 280-283. [ Links ]

Genest, C., Remillard, B. & Beaudoin, D. (2009), 'Goodness-of-fit tests for copulas: A review and a power study, Insurance: Mathematics and Economics 44(2), 199-213. [ Links ]

Guerrero-Cusumano, J. L. (1996a), 'A measure of total variability for the multivariate t distribution with applications to finance', Information Sciences 92(1), 47-63. [ Links ]

Guerrero-Cusumano, J. L. (1996b), 'An asymptotic test of independence for multivariate t and Cauchy random variables with applications, Information Sciences 93(1), 33-45. [ Links ]

Gumbel, E. J. (1960), 'Distributions des valeurs extrêmes en plusieurs dimensions', Publications de l'Institut de statistique de l'Université de Paris 9, 171-173. [ Links ]

Hougaard, P. (1986), 'A class of multivanate failure time distributions', Biometrika 73(3), 671-678. [ Links ]

Hutchinson, T. P. & Lai, C. D. (1990), Continuous bivariate distributions emphasising applications, Rumsby Scientific Publishing, Adelaide. [ Links ]

Jenison, R. L. & Reale, R. A. (2004), 'The shape of neural dependence', Neural computation 16(4), 665-672. [ Links ]

Joe, H. (1989), 'Relative entropy measures of multivariate dependence, Journal of the American Statistical Association 84(405), 157-164. [ Links ]

Kendall, M. G. (1938), 'A new measure of rank correlation, Biometrika 30(1/2), 81-93. [ Links ]

Kinney, J. B. & Atwal, G. S. (2014), 'Equitability, mutual information, and the maximal information coefficient', of the National Academy of Sciences 111(9), 3354-3359. [ Links ]

Kullback, S. (1952), 'An application of information theory to multivariate analysis, The Annals of Mathematical Statistics 23(1), 88-102. [ Links ]

Kullback, S. (1959), Information Theory and Statistics, Wiley, New York. [ Links ]

Kumar, P. (2012), 'Statistical Dependence: Copula functions and mutual information based measures', Journal of Statistics Applications and Probability: An International Journal 1(1), 1-14. [ Links ]

Kwak, N. & Choi, C. H. (2002), 'Input feature selection by mutual information based on Parzen window', IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1667-1671. [ Links ]

Maes, F., Collignon, A., Vandermeulen, D., Marchal, G. & Suetens, P. (1997), 'Multimodality image registration by maximization of mutual information, IEEE transactions on Medical Imaging 16(2), 187-198. [ Links ]

Mercier, G. (2005), Mesures de dépendance entre images rso, Technical report, GET/ENST Bretagne, Tech. Rep. RR-2005003-ITI. [ Links ]

Meyer, C. (2013), 'The bivariate normal copula, Communications in Statistics-Theory and Methods 42(13), 2402-2422. [ Links ]

Nelsen, R. B. (1986), 'Properties of a one-parameter family of bivariate distributions with specified marginals', Communications in Statistics-Theory and Methods 15(11), 3277-3285. [ Links ]

Nelsen, R. B. (2006), An Introduction to Copulas, Springer, New York. [ Links ]

Oakes, D. (1982), 'A model for association in bivariate survival data, Journal of the Royal Statistical Society , Series B 44(3), 414-422. [ Links ]

R Development Core Team (2012), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. *http://www.R-project.orgLinks ]

Raftery, A. E. (1984), 'A continuous multivariate exponential distribution, Communications in Statistics-Theory and Methods 13(8), 947-965. [ Links ]

Raftery, A. E. (1985), 'Some properties of a new continuous bivariate exponential distribution, Statistics and Decisions, Supplement Issue 2, 53-58. [ Links ]

Shannon, C. & Weaver, W. (1949), The Mathematical Theory of Communication, University of Illinois Press, Urbana. [ Links ]

Sklar, A. (1959), 'Fonctions de répartition à n dimensions et leurs marges', Publications de l'Institut de statistique de l'Université de Paris 8, 229-231. [ Links ]

Zeng, X. & Durrani, T. S. (2011), 'Estimation of mutual information using copula density function, Electronics Letters 47(8), 493-494. [ Links ]

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License