SciELO - Scientific Electronic Library Online

 
vol.41 número2Diferenças de sexo associadas ao suicídio e anos potenciais de vida perdidos: um estudo retrospectivoCrenças e práticas para a atenção à saúde de pessoas surdas em Antioquia índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Em processo de indexaçãoCitado por Google
  • Não possue artigos similaresSimilares em SciELO
  • Em processo de indexaçãoSimilares em Google

Compartilhar


Revista Facultad Nacional de Salud Pública

versão impressa ISSN 0120-386Xversão On-line ISSN 2256-3334

Resumo

MEJIA, Jessner Alexander; OVIEDO-BENALCAZAR, Mario Andrés; ORDONEZ, José Armando  e  VALENCIA-MURILLO, José Fernando. Machine learning applied to the prediction of diabetes mellitus, using socioeconomic and environmental information from health system users. Rev. Fac. Nac. Salud Pública [online]. 2023, vol.41, n.2, e06.  Epub 15-Nov-2023. ISSN 0120-386X.  https://doi.org/10.17533/udea.rfnsp.e351168.

Objective:

The objective was to apply models based on machine learning techniques to support the early diagnosis of diabetes mellitus, using environmental, social, economic and health data variables, without dependence on clinical sample collection.

Methodology:

Data from 10,889 users affiliated with the subsidized health system in the southwestern area of Colombia, diagnosed with hypertension and grouped into users without (74.3%) and with (25.7%) diabetes mellitus, were used. Supervised models were trained using k-nearest neighbors, decision trees, and random forests, as well as ensemble-based models, applied to the database before and after balancing the number of cases in each diagnostic group. The performance of the algorithms was evaluated by dividing the database into training and test data (70/30, respectively), and metrics of accuracy, sensitivity, specificity, and area under the curve were used.

Results:

Sensitivity values increased significantly when using balanced data, going from maximum values of 17.1% (unbalanced data) to values as high as 57.4% (balanced data). The highest value of area under the curve (0.61) was obtained with the ensemble models, by applying a balance in the amount of data for each group and by coding the categorical variables. The variables with the greatest weight were associated with hereditary aspects (24.65%) and with the ethnic group (5.59%), in addition to visual difficulty, low water consumption, a diet low in fruits and vegetables, and the consumption of salt and sugar.

Conclusions:

Although predictive models, using people's socioeconomic and environmental information, emerge as a tool for the early diagnosis of diabetes mellitus, their predictive capacity still needs to be improved.

Palavras-chave : machine learning; diabetes mellitus; environmental factors; socioeconomic factors; predictive model.

        · resumo em Português | Espanhol     · texto em Espanhol     · Espanhol ( pdf )