SciELO - Scientific Electronic Library Online

 
vol.8 número1Prevalence of Obesity, Diabetes, Hypertension, and Cardiovascular Risk Factors among Mexican Indigenous GroupsVitamin D Deficiency and Excess Fat Mass in a Colombian Pediatric Population índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Em processo de indexaçãoCitado por Google
  • Não possue artigos similaresSimilares em SciELO
  • Em processo de indexaçãoSimilares em Google

Compartilhar


Revista de investigación e innovación en ciencias de la salud

versão On-line ISSN 2665-2056

Rev. Investig. Innov. Cienc. Salud vol.8 no.1 Medellín jan./jun. 2026  Epub 17-Out-2025

https://doi.org/10.46634/riics.477 

Research Article

Data-driven Subclassification of Treated Hypertensive Adults: Implications for Personalized Management

Clasificación basada en datos de adultos hipertensos tratados: implicaciones para un manejo personalizado

Efrén Murillo-Zamora1  2  * 
http://orcid.org/0000-0002-1118-498X

1 Unidad de Investigación en Epidemiología Clínica; Instituto Mexicano del Seguro Social; Villa de Álvarez; México.

2 Posgrado en Ciencias Médicas; Facultad de Medicina; Universidad de Colima; Colima, México.


Abstract

Introduction.

Hypertension control remains a public health challenge worldwide, with variability in management outcomes. This study aimed to identify blood pressure control phenotypes among Mexican adults with previously diagnosed hypertension.

Methods.

We analyzed individuals (n = 1,308) aged 20 years and older with physician-confirmed hypertension. Controlled hypertension was defined as systolic/diastolic blood pressure < 130/80 mmHg, assessed via triplicate measurements. We used Principal Component Analysis (PCA) to reduce dimensionality and Gaussian Mixture Model (GMM) clustering on PCA-transformed data identified phenotypes.

Results.

GMM clustering identified 8 phenotypes, with cluster sizes ranging from 22 (1.7%) to 325 (24.8%) individuals. Cluster 3, integrated mainly by older women (mean age 66.5 years), long-term hypertension (> 10 years), and high socioeconomic status, was the largest cluster and showed better control. In contrast, Cluster 7 (𝑛 = 142) was mainly constituted by patients with low socioeconomic status and uncontrolled blood pressure. PCA variable contributions highlighted adherence to physical activity (9.07%), dietary modifications (7.24%), sex (7.14%), and type 2 diabetes (6.64%) as dominant factors in principal component 1 (PC1), whereas age explained 89.2% of PC2 variance.

Conclusions.

The presented results suggest heterogeneous hypertension control patterns influenced by demographic, clinical, and behavioral factors. Targeted interventions for high-risk phenotypes, particularly younger patients and those with poor adherence, could enhance blood pressure management strategies in Mexico. The integration of PCA and GMM offers a robust framework for phenotyping complex health conditions in population-based studies.

Keywords: Hypertension; blood pressure; cluster analysis; principal component analysis; Mexico

Resumen

Introducción.

El control de la hipertensión es un desafío de salud pública a nivel mundial, con variabilidad en los resultados de su manejo. Este estudio tuvo como objetivo identificar fenotipos de control de la presión arterial entre adultos mexicanos con diagnóstico previo de hipertensión.

Métodos.

Se analizaron individuos (𝑛 = 1,308) de 20 años o más con hipertensión confirmada por un médico. La hipertensión controlada se definió como presión sistólica/diastólica < 130/80 mmHg, evaluada mediante triple medición. Se utilizó el Análisis de Componentes Principales (ACP) para reducir la dimensionalidad, y un Modelo de Mezcla Gaussiana (GMM) aplicado a los datos transformados por ACP identificó los fenotipos.

Resultados.

El agrupamiento con GMM identificó 8 fenotipos, con tamaños que variaron entre 22 (1.7%) y 325 (24.8%) individuos. El Clúster 3, integrado principalmente por mujeres mayores (edad promedio 66.5 años), con hipertensión de larga duración (> 10 años) y alto nivel socioeconómico, fue el subgrupo más grande con mejor control. En contraste, el Clúster 7 (𝑛 = 142) estuvo constituido principalmente por pacientes con bajo nivel socioeconómico y presión arterial no controlada. Las contribuciones de las variables en el ACP destacaron la adherencia a la actividad física (9.07%), modificaciones dietéticas (7.24%), sexo (7.14%) y diabetes tipo 2 (6.64%) como factores dominantes en el componente principal 1 (PC1), mientras que la edad explicó el 89.2% de la varianza en el PC2.

Conclusiones.

Los resultados presentados sugieren patrones heterogéneos de control de la hipertensión influenciados por factores demográficos, clínicos y conductuales. Intervenciones dirigidas a fenotipos de alto riesgo, particularmente pacientes jóvenes y aquellos con baja adherencia, podrían mejorar las estrategias de manejo de la presión arterial en México. La integración del ACP y el GMM ofrece un marco robusto para la caracterización de fenotipos en condiciones de salud complejas en estudios poblacionales.

Palabras clave: Hipertensión; presión arterial; análisis de conglomerados; análisis de componentes principales; México

Introduction

Hypertension is a major modifiable risk factor for cardiovascular disease, yet achieving adequate blood pressure control in affected patients remains a significant public health challenge [1]. Global estimates indicate that approximately 42% of adults with hypertension are diagnosed and treated, but only 20% of them achieve controlled blood pressure levels [2]. This gap in hypertension management is more pronounced in low- and middle-income countries, including Mexico, where health system barriers and socioeconomic disparities create additional obstacles to optimal care [3].

The management of hypertension is complex due to its multifactorial nature, with control rates influenced by the interplay of biological, behavioral, and socioeconomic determinants [4,5]. Conventional approaches to hypertension classification and treatment, while useful for identifying high-risk clusters, may overlook important variations in treatment response and outcomes. Data-driven phenotyping approaches, which identify clinically and epidemiologically distinct clusters with shared characteristics, could provide critical insights for developing targeted interventions [6].

In this context, data-driven techniques are useful analytical tools for uncovering patterns in complex health data [7]. These methods have demonstrated promise in research on other chronic conditions [8-10]. For hypertension specifically, a data-driven approach implemented at Boston Medical Center improved blood pressure control compared to standard care [11]. To the authors' knowledge, such methods remain underutilized in hypertension research, particularly for Latin American populations. By extending beyond conventional risk factor analysis, these approaches can identify distinct hypertension control phenotypes that may benefit from tailored management strategies.

While multiple unsupervised learning techniques are available, Principal Component Analysis (PCA) and Gaussian Mixture Models (GMM) are useful to capture continuous latent structures [12] and accommodate overlapping clusters through probabilistic boundaries [13], features particularly well-suited to the complexity of hypertension control phenotypes. Compared to other methods, GMM offers greater flexibility in modeling subpopulations with differing variances and covariances [14].

This study aimed to identify and characterize distinct blood pressure control phenotypes among Mexican adults with diagnosed hypertension using PCA and GMM as data-driven techniques. These findings could provide novel insights into the heterogeneity of hypertension control in Mexico and inform more targeted approaches to blood pressure management in similar settings.

Methods

Study population and data sources

Data from the 2022 National Health and Nutrition Survey (ENSANUT, by its Spanish acronym) were analyzed. This nationally representative cross-sectional survey assesses health and nutritional status across Mexico [15].

The study sample included adults aged 20 years and older with physician-diagnosed hypertension, identified through affirmative responses to the survey item: "Has a doctor ever told you that you have high blood pressure?" (response options: Yes, Yes [during pregnancy only], and No). Adult females who reported hypertension diagnosis exclusively during pregnancy were excluded and those with missing data for any study variables.

A total of 11,913 adults were interviewed. Among them, 2,290 reported a prior medical diagnosis of hypertension. Of these, 29 were excluded due to a pregnancy-related diagnosis. A total of 1,933 participants indicated that they were currently receiving pharmacological treatment for hypertension, and 1,542 had sequential blood pressure measurements available. Additionally, 234 individuals were excluded due to missing information relevant to the analysis. No imputation procedures were performed. Observations with missing values for any study variable were excluded from the analytical sample using pairwise deletion.

Outcome

The primary outcome was controlled hypertension, defined as a binary measure (no/yes) among adults with a prior physician diagnosis of arterial hypertension. Blood pressure assessment followed a standardized protocol: after ≥ 5 minutes of rest, trained personnel obtained three sequential measurements using calibrated digital sphygmomanometers, with 2-3-minute intervals between readings. The arithmetic mean of the second and third measurements was calculated, classifying participants as having controlled hypertension if their average systolic pressure was < 130 mmHg and/or diastolic pressure was < 80 mmHg.

Variable selection and preprocessing

Variables spanning key clinical and epidemiological domains were selected. These variables were selected based on their association with the analysed outcome [16-18].

Demographic characteristics included sex, age, and socioeconomic status (categorized as low, middle, or high). Clinical variables comprised time since hypertension diagnosis (stratified as ≤ 5 years, 6-10 years, or > 10 years) and comorbid type 2 diabetes status. Behavioural measures assessed self-reported adherence to dietary modifications and physical activity levels. Healthcare access was characterized by the usual source of medical care, categorized across social security institutions, public sector providers, private services, and other sources.

Principal Component Analysis

PCA was performed to reduce the dimensionality of the dataset and identify the underlying patterns of variability. This analysis facilitated the identification of key variables contributing most to the variance in the data, thereby simplifying the subsequent clustering analysis. By transforming the data into a smaller number of uncorrelated components, PCA enhanced the interpretability of the GMM clustering.

Variable contributions

The contributions of the original variables to each principal component were extracted from the PCA results. For each variable, the mean contribution across all derived dimensions was computed. Variables were then ranked by their contribution to principal component 1(PC1) to identify the most influential features in explaining data variance.

Gaussian Mixture Model

To identify natural clusters within the data, a GMM was employed, a probabilistic clustering approach that assumes the data is generated from a mixture of several multivariate Gaussian distributions. The optimal number of clusters was determined by comparing models with different numbers of components using the Bayesian Information Criterion (BIC), which balances model fit with complexity. To ensure robust parameter estimation, we ran the Expectation-Maximization (EM) algorithm with multiple random initializations, mitigating the risk of converging to local optima.

For each observation, cluster assignments were made by selecting the component with the highest posterior probability. We evaluated cluster quality by examining silhouette width and stability.

To characterize the identified clusters, we computed the mean values of numerical features and the most frequent categories for categorical variables within each cluster. This allowed us to create distinct profiles that highlighted the key differences between clusters.

All analyses were implemented in R using the mclust package (v5.4.7), which provides flexible modeling of covariance structures (e.g., spherical, diagonal, or full). Cluster visualizations were generated by overlaying the GMM results onto principal component plots, facilitating intuitive interpretation of the clustering in reduced-dimensional space.

Ethical considerations

Data used in this study were obtained from the ENSANUT and are publicly available for academic and non-commercial scientific purposes, in accordance with the policies established by the National Institute of Public Health (INSP) of Mexico.

Results

Sample characteristics

Data from 1,308 adult patients were analyzed. The mean age (± standard deviation) was 61.1 ± 13.0 years and the interquartile range was from 52 to 70 years. Most participants were female (69.7%, 𝑛 = 913). A total of 912 patients were identified with controlled blood pressure, therefore the computed prevalence was 69.7%. Other characteristics of the study sample are summarized in Table 1.

Table 1 Characteristics of study sample according to blood-pressure control, Mexico 2022 

Characteristic Overall Uncontrolled Controlled P
(𝐧 = 1,308) (𝒏 = 396) (𝒏 = 912)
Sex
Female 913 (69.8) 263 (66.4) 650 (71.3) 0.079
Male 395 (30.2) 133 (33.6) 262 (28.7)
Age, years 61.1 ± 13.0 63.4 ± 12.3 60.1 ± 13.1 < 0.001
Socioeconomic status
Low 389 (29.7) 141 (35.6) 248 (27.2) 0.003
Middle 458 (35.1) 138 (34.9) 320 (35.1)
High 461 (35.2) 117 (29.5) 344 (37.7)
Time since hypertension diagnosis, years
< 5 561 (42.9) 142 (35.9) 419 (46.0) 0.002
6 - 10 312 (23.9) 101 (25.5) 211 (23.1)
> 10 435 (33.3) 153 (38.6) 282 (30.9)
Comorbid type 2 diabetes mellitus, self-reported
No 852 (65.1) 248 (62.6) 604 (66.2) 0.209
Yes 456 (34.9) 148 (37.4) 308 (33.8)
Usual source of medical care
Social security institutions 666 (50.9) 190 (48.0) 476 (52.2) 0.270
Public sector 257 (19.6) 87 (22.0) 170 (18.6)
Private services 358 (27.4) 108 (27.2) 250 (27.4)
Other 27 (2.1) 11 (2.8) 16 (1.8)
Adherence to dietary modifications, self-reported
No 916 (70.0) 285 (72.0) 631 (69.2) 0.313
Yes 392 (30.0) 111 (28.0) 281 (30.8)
Adherence to physical activity levels, self-reported
No 1,088 331 (83.6) 757 (83.0) 0.796
Yes 220 65 (16.4) 155 (17.0)

Notes: 1) Total counts and relative frequencies are presented for categorical variables, except for age, which is summarized as the arithmetic mean and standard deviation; 2) p-values resulted from chi-squared tests or t-test (for age), as appropriate.

Dimensionality reduction

PCA reduced the dimensionality of the clinical dataset, with the first five components collectively explaining 73.6% of the total variance. The first component (PC1) accounted for 45.9% of the variance, followed by PC2 (11.9%), PC3 (5.8%), PC4 (5.3%), and PC5 (4.7%), indicating that PC1 captured the dominant patterns in the data (Figure 1).

Figure 1 Variance (%) explained by Principal Components Analysis, Mexico 2022 

Cluster identification

GMM applied to the PCA-transformed space identified eight distinct clusters (𝑘 = 8) among evaluated adults with previously diagnosed hypertension. Model selection via the BIC favoured an ellipsoidal, equal-shape covariance model (VEV) with BIC = -1,954.40 (Figure 2), demonstrating better fit compared to alternative cluster configurations (𝑘 = 1-8 tested). The integrated completed likelihood (ICL = -1,955.29) supported this solution. The VEV model outperformed simpler parameterizations, indicating that while patient clusters share similar geometric proportions (equal shape), they vary in size (unequal volume) and spatial orientation (varying orientation).

Abbreviations: EII, Spherical, equal volume; VII, Spherical, unequal volume; EEI, Diagonal, equal volume and shape; VEI, Diagonal, equal shape; EVI, Diagonal, equal volume; VVI, Diagonal, varying volume and shape; EEE, Ellipsoidal, equal volume/shape/orientation; EEV, Ellipsoidal, equal volume and shape; VEV, Ellipsoidal, equal shape; VVV, Ellipsoidal, varying volume/shape/orientation.

Figure 2 Bayesian Information Criterion (BIC) values for Gaussian Mixture Models with 1-8 clusters in hypertension phenotyping, Mexico 2022 

Cluster stability was high, with minimal off-diagonal overlap in the co-assignment matrix (Figure 3), supporting strong separation.

Note: The concentration of red exclusively along the diagonal suggests that observations within the same cluster are highly consistent (strong internal similarity).

Figure 3 Matrix of stability for the eight identified clusters, Mexico 2022 

Cluster sizes (Figure 4) were heterogeneous, ranging from 22 patients (Cluster 8) to 325 patients (Cluster 3), with intermediate clusters of 88 (Cluster 1), 24 (Cluster 2), 312 (Cluster 4), 154 (Cluster 5), 241 (Cluster 6), and 142 (Cluster 7) individuals.

Figure 4 Cluster visualization in Principal Component Space, Mexico 2022 

Cluster quality assessment

The silhouette analysis revealed meaningful variation in cluster quality across the identified clusters (Figure 5). Two clusters showed particularly strong separation: Cluster 2 (average silhouette width = 0.72) and Cluster 8 (average silhouette width = 0.58), indicating these represent well-defined, distinct clinical profiles. In contrast, Clusters 3, 4, and 6 showed negative or near-zero silhouette scores, suggesting overlap with neighboring clusters and less distinct phenotypic boundaries. The remaining clusters (1 and 7) exhibited marginal separation (silhouette widths 0.01-0.06), potentially representing transitional or heterogeneous patient clusters. These findings suggest that while the eight-cluster solution captures two well differentiated phenotypes, the larger clusters may encompass patients with more heterogeneous characteristics or may benefit from alternative stratification approaches.

Abbreviations: T2DM, type 2 diabetes mellitus.

Notes: 1) Uncontrolled hypertension, T2DM diagnosis, adherence to physical activity, and adherence to dietary modifications are binary variables (0 = No, 1 = Yes); 2) Sex is a binary variable (0 = Female, 1 = Male); 3) For usual source of medical care, the code 1 denotes care provided by social‑security institutions; 4) Socioeconomic status was coded as 1 = Low, 2 = Middle, and 3 = High; and 5) Cluster sizes were as follows: 1 (𝑛 = 88), 2 (𝑛 = 24), 3 (𝑛 = 325), 4 (𝑛 = 312), 5 (𝑛 = 154), 6 (𝑛 = 241), 7 (𝑛 = 142), and 8 (𝑛 = 22).

Figure 5 Characteristics of identified clusters, Mexico 2022 

Cluster characterization

The largest cluster (Cluster 3) predominantly included older women, with a mean age of 66.5 years, who had lived with hypertension for at least a decade and generally belonged to a high socioeconomic stratum. Cluster 8 had the youngest participants (mean age = 59.1 years) and was characterized by a mid‑level socioeconomic profile. Cluster 7 concentrated the greatest proportion of individuals with uncontrolled hypertension, indicating that blood‑pressure management challenges are largely confined to this smaller cluster.

Adherence to physical activity (9.1%), dietary modifications (7.2%), sex (7.1%), and T2DM (6.6%) jointly contributed 30.0% of PC1’s variance (45.9% of total variance). Age explained 89.2% of PC2’s variance (11.9% total), highlighting its orthogonal role (Table 2).

Table 2 Percentage of variance contributed by each variable to the first two PCA dimensions, Mexico 2022. 

Variable Dimension 1 Dimension 2
Adherence to physical activity 9.07 0.27
Adherence to dietary modifications 7.24 0.16
Sex 7.14 0.09
T2DM diagnosis 6.64 0.38
Time since hypertension diagnosis 2.82 2.28
Socioeconomic status 2.66 0.08
Usual source of medical care 2.28 0.07
Age 0.01 89.2

Note. Abbreviations: PCA, Principal Components Analysis; T2DM, type 2 diabetes mellitus.

Discussion

The presented results suggest eight distinct hypertensive phenotypes with potential clinical implications. Two particularly well-defined clusters were observed (Cluster 2, 𝑛 = 24; and Cluster 8, 𝑛 = 22), representing patients with clear phenotypic patterns that may benefit from tailored management approaches. The robust separation of these clusters (silhouette widths 0.58-0.72) suggests they constitute clinically meaningful clusters. Caution is needed in generalizing these findings due to the relatively small size of these clusters. Sample size influences both the stability of unsupervised classifications and the external validity of phenotype-derived clinical implications [19]. Smaller clusters may reflect true but rare subpopulations, or even artifacts of algorithmic sensitivity, and should ideally be replicated in larger cohorts or prospective designs before clinical translation is attempted.

Three key findings deserve discussion. The uncontrolled hypertension cluster (Cluster 7, 𝑛 = 142) appeared to represent a high-priority population where current management strategies may be insufficient. This cluster's strong association with physical inactivity (9.1% variance contribution) and poor dietary adherence (7.2%) is consistent with prior evidence identifying these behaviors as significant risk factors for cardiovascular and cerebrovascular diseases in young and middle-aged adults [20,21]. Moreover, while Yu and Chen [21] observed that targeted lifestyle interventions can improve both blood pressure control and cognitive function in hypertensive individuals, our findings extend this literature by identifying a distinct subgroup with compounded behavioural risks. This suggests that precision-targeted interventions, guided by unsupervised classification, may enhance the effectiveness of lifestyle strategies in populations where conventional approaches have limited impact.

The elderly female cluster (Cluster 3, 𝑛 = 325) showed characteristics of long-standing disease. Given their advanced mean age (66.5 years) and high socioeconomic status, more aggressive monitoring for end-organ damage may be warranted. Cluster 8 may represent an important opportunity for early intervention. With a mean age of 59.1 years, this population could benefit from intensive risk factor modification to prevent the cardiovascular complications seen in older clusters [22,23].

The weak separation of several larger clusters (3, 4, and 6) likely reflects both the biological complexity of hypertension and limitations in current clinical characterization. While our model incorporated standard clinical variables, the inclusion of biomarkers might improve differentiation of these clusters in future studies [24]. These may include markers of vascular inflammation (e.g., high-sensitivity C-reactive protein, IL-6), cardiac stress (e.g., NT-proBNP, copeptin), and metabolic dysfunction, which have demonstrated utility in cardiovascular risk stratification and may help differentiate clusters with distinct pathophysiological profiles, especially when integrated with behavioral and sociodemographic data [25-27].

These findings support a shift toward phenotype-specific approaches to hypertension management. For instance, patients in Cluster 2, characterized by a strong profile with a silhouette score of 0.72, are likely to respond well to standardized treatment protocols. In contrast, the persistent blood pressure control challenges observed in Cluster 7 may need more intensive follow-up and coordinated multidisciplinary care. Meanwhile, the younger age profile of individuals in Cluster 8 suggests that early intervention could be particularly beneficial for this phenotype.

In our study, although 234 individuals were excluded due to missing data, and others due to specific conditions such as pregnancy-related hypertension, no significant differences were observed between included and excluded participants with regard to the key sociodemographic and clinical variables evaluated. Therefore, the risk of selection bias may be low. No imputation procedures were applied, further ensuring the integrity of the observed patterns within the analyzed sample.

While the stratification of hypertensive phenotypes offers promising avenues for personalized care, its translation into actionable interventions within the Mexican public health system warrants critical reflection. The system’s segmented structure, divided among social security institutions, public sector providers, and private services, creates substantial variability in diagnostic capacity, treatment availability, and continuity of care [28]. These disparities are further compounded by geographic inequities, with rural and marginalized populations facing longer wait times, limited access to specialists, and under-resourced facilities [29].

Ethically, the deployment of stratification models must avoid reinforcing existing inequities. For example, phenotypes identified as high-risk may not benefit from tailored interventions if the infrastructure to support such care is absent or inconsistently distributed. Organizationally, the feasibility of implementing phenotype-guided strategies depends on the system’s ability to integrate data-driven tools into routine practice, ensure equitable access to diagnostics, and align treatment protocols across fragmented care pathways.

Future research should validate these phenotypes against hard cardiovascular outcomes and test whether phenotype-guided therapy improves blood pressure control rates compared to current approaches.

This study has several other methodological limitations. First, the possibility of autocorrelation among categorical predictors may affect the independence assumptions required for certain analyses. While PCA was employed to reduce dimensionality and capture shared variance, it does not fully account for structural dependencies across categorical variables. The presence of negative silhouette coefficients in some clusters likely reflects overlapping feature profiles among clusters rather than definitive misclassification. These may represent transitional phenotypes or latent gradients, rather than clearly bounded clusters. Although PCA helped address collinearity and improve signal extraction, the selected input variables may have lacked sufficient discriminative power to achieve tight separation between clusters.

Second, the operationalization of self-reported behavioural measures, particularly adherence to physical activity and dietary modifications, introduces susceptibility to recall bias and social desirability effects. While these variables offer insight into participants’ engagement in lifestyle interventions, potential misclassification may have influenced phenotype assignment.

Third, the broad inclusion criteria may introduce substantial clinical heterogeneity, particularly in terms of treatment intensity, therapeutic adherence, and disease progression. Although the decision to include all adults with physician-diagnosed hypertension was intended to reflect real-world patient diversity in Mexico, this inclusiveness may reduce the specificity of the identified phenotypes and constrain the external validity of findings when applied to more narrowly defined hypertensive subpopulations.

Fourth, the ENSANUT 2022 dataset lacked consistent information regarding treatment status, which limited analytical precision. It was not possible to stratify participants by antihypertensive regimen intensity or to distinguish between monotherapy and polytherapy. Hypertension severity could not be categorized beyond a binary classification based on blood pressure control. These constraints reduced the granularity of phenotype characterization and may have limited the clinical interpretability of the identified clusters.

Fifth, while clusters characterized by poor blood pressure control were characterized, the inability to evaluate treatment-related factors (such as medication type, dosing intensity, adherence, and duration) limits interpretation of the underlying drivers of this phenotype. As such, the observed classification may reflect uncontrolled hypertension without accounting for therapeutic variation.

Finally, while type 2 diabetes mellitus was included as a clinically relevant covariate given its epidemiologic burden in Mexico, the omission of other prevalent comorbidities as well as the absence of polypharmacy indicators, may limit the clinical applicability of the phenotypic classifications.

Conclusions

The findings of this study show heterogeneous patterns in hypertension control, influenced by a combination of demographic, clinical, and behavioral factors. These variations underscore the importance of moving beyond a one-size-fits-all approach to hypertension management and instead adopting strategies tailored to specific patient profiles. By identifying and targeting high-risk phenotypes, healthcare providers and policymakers in Mexico can implement more effective, patient-centered strategies to reduce the burden of uncontrolled hypertension.

The practical implementation of classification systems warrants further exploration. Integrating clustering-derived phenotypes into existing care protocols, such as risk stratification tools used in primary care, could enhance early identification of patients requiring intensified follow-up or behavioural interventions. Embedding these models into electronic health records or decision-support systems may facilitate scalable, context-sensitive deployment, particularly in resource-constrained settings.

Acknowledgments

The author would like to thank the Health Research Coordination of the Mexican Social Security Institute for the support provided in conducting this research.

References

1. Fuchs FD, Whelton PK. High Blood Pressure and Cardiovascular Disease. Hypertension [Internet]. 2020;75(2):285-92. doi: https://doi.org/10.1161/HYPERTENSIONAHA.119.14240Links ]

2. World Health Organization (WHO) [Internet]. Geneva: WHO; c2025. Hypertension; 2023 Mar 16 [cited 2025 Apr 24]; [about 6 screens]. Available from: https://www.who.int/news-room/fact-sheets/detail/hypertensionLinks ]

3. Schutte AE, Jafar TH, Poulter NR, Damasceno A, Khan NA, Nilsson PM, et al. Addressing global disparities in blood pressure control: perspectives of the International Society of Hypertension. Cardiovasc Res [Internet]. 2023;119(2):381-409. doi: https://doi.org/10.1093/cvr/cvac130Links ]

4. Silberzan L, Bajos N, Kelly‐Irving M. Unveiling the gaps: Hypertension control beyond the cascade of care framework. J Clin Hypertens [Internet]. 2024;26(7):861-6. doi: https://doi.org/10.1111/jch.14849Links ]

5. Razon N, Hessler D, Bibbins-Domingo K, Gottlieb L. How hypertension guidelines address social determinants of health: a systematic scoping review. Med Care [Internet]. 2021;59(12):1122-9. doi: https://doi.org/10.1097/MLR.0000000000001649Links ]

6. Kaur S, Kim R, Javagal N, Calderon J, Rodriguez S, Murugan N, et al., Precision medicine with data-driven approaches: A framework for clinical translation. AIJMR [Internet]. 2024;2(5):1-44. doi: https://doi.org/10.62127/aijmr.2024.v02i05.1077Links ]

7. eBioMedicine. Machine learning in hypertension: from diagnosis to personalised medicine. EBioMedicine [Internet], 2023;92:104658. doi: https://doi.org/10.1016/j.ebiom.2023.104658Links ]

8. Lu Z, Dong B, Cai H, Tian T, Wang J, Fu L, et al. Identifying Data-Driven Clinical Subgroups for Cervical Cancer Prevention With Machine Learning: Population-Based, External, and Diagnostic Validation Study. JMIR Public Health Surveill [Internet]. 2025;11:e67840. doi: https://doi.org/10.2196/67840Links ]

9. Wang H, Shimizu C, Bainto E, Hamilton S, Jackson HR, Estrada-Rivadeneyra D, et al. Subgroups of children with Kawasaki disease: a data-driven cluster analysis. Lancet Child Adolesc Health [Internet]. 2023;7(10):697-707. doi: https://doi.org/10.1016/S2352-4642(23)00166-9Links ]

10. Gezsi A, Auwera S, Mäkinen H, Eszlari N, Hullam G, Nagy T, et al. Unique genetic and risk-factor profiles in clusters of major depressive disorder-related multimorbidity trajectories. Nat Commun [Internet]. 2024;15(1):7190. doi: https://doi.org/10.1038/s41467-024-51467-7Links ]

11. Hu Y, Huerta J, Cordella N, Mishuris RG, Paschalidis IC. Personalized hypertension treatment recommendations by a data-driven model. BMC Med Inform Decis Mak [Internet]. 2023;23(1):44. doi: https://doi.org/10.1186/s12911-023-02137-zLinks ]

12. Alqahtani NA, Kalantan ZI. Gaussian mixture models based on principal components and applications. Mathematical Problems in Engineering [Internet]. 2020;2020(1):1202307. doi: https://doi.org/10.1155/2020/1202307Links ]

13. Sun H, Wang S. Measuring the component overlapping in the Gaussian mixture model. Data Min Knowl Disc [Internet]. 2011;23:479-502. doi: https://doi.org/10.1007/s10618-011-0212-3Links ]

14. Cavicchia C, Vichi M, Zaccaria G. Gaussian mixture model with an extended ultrametric covariance structure. Adv Data Anal Classif [Internet]. 2022;16:399-427. doi: https://doi.org/10.1007/s11634-021-00488-xLinks ]

15. Lazcano Ponce E. Encuesta Nacional de Salud y Nutrición 2022: un proyecto representativo del Instituto Nacional de Salud Pública. Salud Publica Mex [Internet]. 2023;65:s5-s6. doi: https://doi.org/10.21149/15056Links ]

16. Zhou B, Perel P, Mensah G, Ezzati M. Global epidemiology, health burden and effective interventions for elevated blood pressure and hypertension. Nat Rev Cardiol [Internet]. 2021;18(11):785-802. doi: https://doi.org/10.1038/s41569-021-00559-8Links ]

17. Lindley KJ, Aggarwal NR, Briller JE, Davis MB, Douglass P, Epps KC, et al. Socioeconomic determinants of health and cardiovascular outcomes in women: JACC review topic of the week. JACC [Internet]. 2021;78(19):1919-29. doi: https://doi.org/10.1016/j.jacc.2021.09.011Links ]

18. Connelly PJ, Currie G, Delles C. Sex differences in the prevalence, outcomes and management of hypertension. Curr Hypertens Rep [Internet]. 2022;24(6):185-92. doi: https://doi.org/10.1007/s11906-022-01183-8Links ]

19. Liu T, Yu H, Blair RH. Stability estimation for unsupervised clustering: A review. Wiley Interdiscip Rev Comput Stat [Internet]. 2022;14(6):e1575. doi: https://doi.org/10.1002/wics.1575Links ]

20. Wu Y, Xiong Y, Wang Ping, Liu R, Jia X, Kong Yuyan, et al. Risk factors of cardiovascular and cerebrovascular diseases in young and middle-aged adults: A meta-analysis. Medicine [Internet]. 2022;101(48):e32082. doi: https://doi.org/10.1097/MD.0000000000032082Links ]

21. Yu Y, Chen Q. The impact of targeted lifestyle interventions on brain function in young and middle-aged patients with hypertension: A retrospective cohort analysis. Pak J Med Sci [Internet]. 2024;40(11):2588-93. doi: https://doi.org/10.12669/pjms.40.11.10304Links ]

22. Arivuselvan H, Kumaravelu N, Priyatharicini A. Young Hypertension-Is It the Beginning of the End? Cardiometry. 2022(24):231-4. [ Links ]

23. Azegami T, Uchida K, Tokumura M, Mori M. Blood pressure tracking from childhood to adulthood. Front Pediatr [Internet]. 2021;9:785356. doi: https://doi.org/10.3389/fped.2021.785356Links ]

24. Al-Tashi Q, Saad MB, Muneer A, Qureshi R, Mirjalili S, Sheshadri A, et al. Machine learning models for the identification of prognostic and predictive cancer biomarkers: a systematic review. Int J Mol Sci [Internet]. 2023;24(9):7781. doi: https://doi.org/10.3390/ijms24097781Links ]

25. Aldisi RS, Alsamman AM, Krawitz P, Maj C, Zayed H. Identification of novel proteomic biomarkers for hypertension: a targeted approach for precision medicine. Clin Proteom [Internet]. 2025;22(1):7. doi: https://doi.org/10.1186/s12014-024-09519-zLinks ]

26. Piskorz D, Keller L, Citta L, Tissera G, Mata L, Bongarzoni L. Metabolic biomarkers and cardiovascular risk stratification in hypertension. Hipertensión y Riesgo Vascular [Internet]. 2024;41(3):162-9. doi: https://doi.org/10.1016/j.hipert.2024.06.003Links ]

27. Zhang XR, Zhong WF, Liu RY, Huang JL, Fu JX, Gao J, et al. Improved prediction and risk stratification of major adverse cardiovascular events using an explainable machine learning approach combining plasma biomarkers and traditional risk factors. Cardiovasc Diabetol [Internet]. 2025;24:153. doi: https://doi.org/10.1186/s12933-025-02711-xLinks ]

28. Block MAG, Reyes Morales H, Cahuana Hurtado L, Balandrán A, Méndez E. Mexico. Health system review. Health Syst Transit [Internet]. 2020;22(2):1-222. Available from: https://eurohealthobservatory.who.int/publications/i/mexico-health-system-review-2020Links ]

29. Serván-Mori E, Garcia-Diaz R, Meneses-Navarro S, Gómez-Dantés O, Cerecero-García D, Castro A, et al. Inequities in the continuum of maternal care in Mexico: trends before and after COVID-19. Int J Equity Health [Internet]. 2025;24(1):178. doi: https://doi.org/10.1186/s12939-025-02470-xLinks ]

Copyright: © 2025 María Cano University Foundation. The Revista de Investigación e Innovación en Ciencias de la Salud provides open access to all its content under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

Editor: Fraidy-Alonso Alzate-Pamplona, MSc. https://orcid.org/0000-0002-6342-3444

Declaration of interests: The author declares no conflicts of interest.

Funding: The author received no specific funding for this work.

Ethics statement: Data used in this study were obtained from the ENSANUT and are publicly available for academic and non-commercial scientific purposes, in accordance with the policies established by the National Institute of Public Health (INSP) of Mexico.

Data availability: This study utilized publicly available data from the Cuestionario de Salud de Adultos (20 years and older) and the Cuestionario de Antropometría y Tensión Arterial, which can be accessed at: https://ensanut.insp.mx/encuestas/ensanutcontinua2022/descargas.php The R script used for the analysis is available from the author upon reasonable request.

Author Contributions

Efrén Murillo-Zamora: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing - original draft, writing - review & editing.

Generative AI declaration: The AI tool Grammarly (Grammarly, Inc., San Francisco, USA) was used to improve the readability of this manuscript and to provide grammatical corrections. It was applied to the Introduction, Methods, and Discussion sections. I confirm that the tool’s suggestions were critically reviewed to ensure the precision and accuracy of the manuscript content. Furthermore, I declare that the tool was not used to generate original content without human supervision or intervention. Finally, I explicitly state that the AI tool did not influence the interpretation of the reported results.

Cite this article: Murillo-Zamora E. Data-driven Subclassification of Treated Hypertensive Adults: Implications for Personalized Management. Revista de Investigación e Innovación en Ciencias de la Salud. 2026;8(1):1-15. e-v8n1a477. https://doi.org/10.46634/riics.477

Disclaimer: The content of this article is the sole responsibility of the author and does not necessarily reflect the official views of the author’s affiliated institutions or the Revista de Investigación e Innovación en Ciencias de la Salud.

Received: April 29, 2025; Revised: June 19, 2025; Accepted: September 08, 2025

*Correspondence: Efrén Murillo-Zamora. Email: efren.murilloza@imss.gob.mx

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License