SciELO - Scientific Electronic Library Online

 
vol.41 issue2Sex differences associated with suicide and potential years of life lost: a retrospective studyBeliefs and practices for health care for deaf people in Antioquia author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Facultad Nacional de Salud Pública

Print version ISSN 0120-386XOn-line version ISSN 2256-3334

Abstract

MEJIA, Jessner Alexander; OVIEDO-BENALCAZAR, Mario Andrés; ORDONEZ, José Armando  and  VALENCIA-MURILLO, José Fernando. Machine learning applied to the prediction of diabetes mellitus, using socioeconomic and environmental information from health system users. Rev. Fac. Nac. Salud Pública [online]. 2023, vol.41, n.2, e06.  Epub Nov 15, 2023. ISSN 0120-386X.  https://doi.org/10.17533/udea.rfnsp.e351168.

Objective:

The objective was to apply models based on machine learning techniques to support the early diagnosis of diabetes mellitus, using environmental, social, economic and health data variables, without dependence on clinical sample collection.

Methodology:

Data from 10,889 users affiliated with the subsidized health system in the southwestern area of Colombia, diagnosed with hypertension and grouped into users without (74.3%) and with (25.7%) diabetes mellitus, were used. Supervised models were trained using k-nearest neighbors, decision trees, and random forests, as well as ensemble-based models, applied to the database before and after balancing the number of cases in each diagnostic group. The performance of the algorithms was evaluated by dividing the database into training and test data (70/30, respectively), and metrics of accuracy, sensitivity, specificity, and area under the curve were used.

Results:

Sensitivity values increased significantly when using balanced data, going from maximum values of 17.1% (unbalanced data) to values as high as 57.4% (balanced data). The highest value of area under the curve (0.61) was obtained with the ensemble models, by applying a balance in the amount of data for each group and by coding the categorical variables. The variables with the greatest weight were associated with hereditary aspects (24.65%) and with the ethnic group (5.59%), in addition to visual difficulty, low water consumption, a diet low in fruits and vegetables, and the consumption of salt and sugar.

Conclusions:

Although predictive models, using people's socioeconomic and environmental information, emerge as a tool for the early diagnosis of diabetes mellitus, their predictive capacity still needs to be improved.

Keywords : machine learning; diabetes mellitus; environmental factors; socioeconomic factors; predictive model.

        · abstract in Spanish | Portuguese     · text in Spanish     · Spanish ( pdf )