SciELO - Scientific Electronic Library Online

 
vol.40 issue2Analysis of the Operational Variables in the Extraction Stage of a Sugar MillDevelopment of a Computational Tool to Evaluate the Energy Diversification of Transportation Systems in Colombia author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Ingeniería y Desarrollo

Print version ISSN 0122-3461On-line version ISSN 2145-9371

Abstract

RAMIREZ, Juan Sebastián  and  DUQUE-MENDEZ, Néstor. Evaluation of Unsupervised Machine Learning Algorithms with Climate Data. Ing. Desarro. [online]. 2022, vol.40, n.2, pp.131-165.  Epub Apr 10, 2023. ISSN 0122-3461.  https://doi.org/10.14482/inde.40.02.622.553.

When using climate data, researchers have difficulty determining the clustering algorithm and the best performing parameters for processing a specific dataset. We evaluated of the following unsupervised machine learning algorithms: K-means, K-medoids and Linkage-complete, which are applied to three datasets with climatological variables (temperature, rainfall, relative humidity, and solar radiation) for three meteorological stations located in the department of Caldas, Colombia, at different heights above sea level. Five scenarios are defined for 2, 3, and 5 clusters for each of the two partitioned algorithms, and five scenarios for the hierarchical algorithm, in each one of the meteorological stations. Different quantities and groupings of variables are applied for the different scenarios by using Euclidean distance. Davis-Bouldin is the applied method of quality evaluation of clusters. Normalization with techniques such as range-transformation and Z-trans-formation, as well as some iterations of the algorithm and reduction of dimensionality with PCA. In addition, the computational cost is evaluated. This study can guide researchers on certain decisions in cluster analysis used in meteorological data, as well as identify the most important algorithm and parameters to take into consideration for the best performance, according to particular conditions and requirements.

Keywords : Climate; clustering; machine learning; K-means; K-medoids.

        · abstract in Spanish     · text in English     · English ( pdf )