A New Method for Detecting Significant p-values with Applications to Genetic Data

VÉLEZ, JORGE IVÁN; CORREA, JUAN CARLOS; ARCOS-BURGOS, MAURICIO

doi:10.15446/rce.v37n1.44358

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Citado por Google
Similares en SciELO
Similares en Google

Otros
Otros

Permalink

Revista Colombiana de Estadística

versión impresa ISSN 0120-1751

Rev.Colomb.Estad. vol.37 no.1 Bogotá ene./jun. 2014

https://doi.org/10.15446/rce.v37n1.44358

http://dx.doi.org/10.15446/rce.v37n1.44358

A New Method for Detecting Significant p-values with Applications to Genetic Data

Una nuevo método para la detección de valores p significativos y su aplicación a datos genéticos

JORGE IVÁN VÉLEZ¹, JUAN CARLOS CORREA², MAURICIO ARCOS-BURGOS³

¹The Australian National University, Genomics and Predicitive Medicine Group, Genome Biology Department, John Curtin School of Medical Research, Canberra, ACT, Australia. University of Antioquia, Group of Neurosciences, Medellín, Colombia. National University of Colombia, Research Group in Statistics, Medellín, Colombia. Ph.D Scholar. Email: jorge.velez@anu.edu.au
²National University of Colombia, Research Group in Statistics, Medellín, Colombia. National University of Colombia, Department of Statistics, Medellín, Colombia. Associate professor. Email: jccorrea@unal.edu.co
³The Australian National University, Genomics and Predicitive Medicine Group, Genome Biology Department, John Curtin School of Medical Research, Canberra, ACT, Australia. University of Antioquia, Group of Neurosciences, Medellín, Colombia. Associate professor. Email: mauricio.arcos-burgos@anu.edu.au

Abstract

A new method for detecting significant p-values is described in this paper. This method, based on the distribution of the m-th order statistic of a U(0,1) distribution, is shown to be suitable in applications where m\rightarrow ∞ independent hypothesis are tested and it is of interest for a fixed type I error probability to determine those being significant while controlling the false positives. Equivalencies and comparisons between our method and others methods based-on p-values are also established, and a graphical representation of the distribution of the test statistic is depicted for different values of m. Finally, our proposal is illustrated with two microarray data sets.

Key words: Extreme values theory, p-value, Type I error probability, Multiple testing, Genetic data.

Resumen

Se describe una nuevo método para la detección de valores p significativos. Este método, basado en el m-ésimo estadístico de orden de la distribución U(0,1), es adecuado en casos en los que se realizan m\rightarrow ∞ pruebas de hipótesis independientes y es de interés determinar aquellas que son significativas, controlando los falsos positivos, para una probabilidad de error tipo I predeterminada. Adicionalmente, se realiza una comparación con algunas pruebas clásicas y se grafica la distribución del estadístico de prueba para diferentes valores de m. Finalmente se ilustra el uso de la metodología con dos conjuntos de datos provenientes de estudios con microarreglos.

Palabras clave: teoría de valores extremos, valor-p, probabilidad de error tipo I, comparaciones múltiples, datos genéticos.

Texto completo disponible en PDF

References

1. Benjamini, Y. & Hochberg, Y. (1995), 'Controlling the false discovery rate: a practical and powerful approach to multiple testing', Journal of the Royal Statistical Society, Series B (Methodological) 57(1), 389-300. [ Links ]

2. Benjamini, Y. & Yekutieli, D. (2001), 'The control of the false discovery rate in multiple testing under dependency', Annals of Statistics 29(4), 1165 - 1188. [ Links ]

3. Bonferroni, C. E. (1935), 'Il calcolo delle assicurazioni su gruppi di teste', Studi in Onore del Professore Salvatore Ortu Carboni,, 13-60. [ Links ]

4. Casella, G. & Berger, R. (2001), Statistical Inference, 2 edn, Duxbury Press, United States of America. [ Links ]

5. Devroye, L. (1986), Non-Uniform Random Variate Generation, New York: Spring-Verlang. [ Links ]

6. Gentleman, R., Carey, V., Huber, W. & Hahne, F. (2011), genefilter: Methods for filtering genes from microarray experiments. R package version 1.34.0. [ Links ]

7. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C. & Lander, E. (1999), 'Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring', Science 286, 531-537. [ Links ]

8. Liu, J. Z., Mcrae, A. F., Nyholt, D. R., Medland, S. E., Wray, N. R., Brown, K. M., Hayward, N. K., Montgomery, G. W., Visscher, P. M., Martin, N. G. & Macgregor, S. (2010), 'A versatile gene-based test for genome-wide association studies', The American Journal of Human Genetics 87(1), 139 - 145. [ Links ]

9. Manolio, T. A. (2010), 'Genomewide association studies and assessment of the risk of disease', New England Journal of Medicine 363(2), 166-176. [ Links ]

10. Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstráaale, M., Laurila, E., Houstis, N., Daly, M. J., Patterson, N., Mesirov, J. P., Golub, T. R., Tamayo, P., Spiegelman, B., Lander, E. S., Hirschhorn, J. N., Altshuler, D. & Groop, L. C. (2003), 'Pgc-1álpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes', Nature Genetics 34(3), 267-73. [ Links ]

11. Murdoch, D., Tsai, Y. & Adcock, J. (2008), 'P-values are random variables', The American Statistician 62(3), 242-245. [ Links ]

12. Nyholt, D. R. (2004), 'A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other', The American Journal of Human Genetics 74(4), 765 - 769. [ Links ]

13. Pollard, K. S., Gilbert, H. N., Ge, Y., Taylor, S. & Dudoit, S. (2011), multtest: Resampling-based multiple hypothesis testing. R package version 2.8.0. [ Links ]

14. R Core Team, (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. *http://www.R-project.org/ [ Links ]

15. Sackrowitz, H. & Samuel-Cahn, E. (1999), 'P Values as Random Variables-Expected P Values', The American Statistician 53(4), 326-331. [ Links ]

16. Serfling, R. (1980), Approximation Theorems of Mathematical Statistics, John Wiley & Sons, United States of America. [ Links ]

17. Shaffer, J. P. (1995), 'Multiple hypothesis testing', Annual Review of Psychology 46, 561-584. [ Links ]

[Recibido en noviembre de 2012. Aceptado en enero de 2014]

Este artículo se puede citar en LaTeX utilizando la siguiente referencia bibliográfica de BibTeX:

@ARTICLE{RCEv37n1a05,    
      AUTHOR  = {Vélez, Jorge Iván and Correa, Juan Carlos and Arcos-Burgos, Mauricio},    
      TITLE   = {{A New Method for Detecting Significant p-values with Applications to Genetic Data}},    
      JOURNAL = {Revista Colombiana de Estadística},    
     YEAR    = {2014},    
     volume  = {37},    
     number  = {1},    
     pages   = {69-78}    
 }