SciELO - Scientific Electronic Library Online

 
vol.41 número2Diferencias de sexo asociadas al suicidio y años potenciales de vida perdidos: un estudio retrospectivoCreencias y prácticas para el cuidado de la salud de las personas sordas de Antioquia índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Revista Facultad Nacional de Salud Pública

versión impresa ISSN 0120-386Xversión On-line ISSN 2256-3334

Resumen

MEJIA, Jessner Alexander; OVIEDO-BENALCAZAR, Mario Andrés; ORDONEZ, José Armando  y  VALENCIA-MURILLO, José Fernando. Machine learning applied to the prediction of diabetes mellitus, using socioeconomic and environmental information from health system users. Rev. Fac. Nac. Salud Pública [online]. 2023, vol.41, n.2, e06.  Epub 15-Nov-2023. ISSN 0120-386X.  https://doi.org/10.17533/udea.rfnsp.e351168.

Objective:

The objective was to apply models based on machine learning techniques to support the early diagnosis of diabetes mellitus, using environmental, social, economic and health data variables, without dependence on clinical sample collection.

Methodology:

Data from 10,889 users affiliated with the subsidized health system in the southwestern area of Colombia, diagnosed with hypertension and grouped into users without (74.3%) and with (25.7%) diabetes mellitus, were used. Supervised models were trained using k-nearest neighbors, decision trees, and random forests, as well as ensemble-based models, applied to the database before and after balancing the number of cases in each diagnostic group. The performance of the algorithms was evaluated by dividing the database into training and test data (70/30, respectively), and metrics of accuracy, sensitivity, specificity, and area under the curve were used.

Results:

Sensitivity values increased significantly when using balanced data, going from maximum values of 17.1% (unbalanced data) to values as high as 57.4% (balanced data). The highest value of area under the curve (0.61) was obtained with the ensemble models, by applying a balance in the amount of data for each group and by coding the categorical variables. The variables with the greatest weight were associated with hereditary aspects (24.65%) and with the ethnic group (5.59%), in addition to visual difficulty, low water consumption, a diet low in fruits and vegetables, and the consumption of salt and sugar.

Conclusions:

Although predictive models, using people's socioeconomic and environmental information, emerge as a tool for the early diagnosis of diabetes mellitus, their predictive capacity still needs to be improved.

Palabras clave : machine learning; diabetes mellitus; environmental factors; socioeconomic factors; predictive model.

        · resumen en Español | Portugués     · texto en Español     · Español ( pdf )