SciELO - Scientific Electronic Library Online

 
vol.28 issue50Strength benefit of sawdust/wood ash amendment in cement stabilization of an expansive soilRepresentation and estimation of the power coefficient in wind energy conversion systems author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


Revista Facultad de Ingeniería

Print version ISSN 0121-1129

Abstract

TORRES-DOMINGUEZ, Omar et al. Anomalies detection for big data. Rev. Fac. ing. [online]. 2019, vol.28, n.50, pp.62-76. ISSN 0121-1129.  https://doi.org/10.19053/01211129.v28.n50.2019.8793.

The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-Big Data, can process large volumes of data.

Keywords : big data; data mining; detecting anomalies; MapReduce.

        · abstract in Spanish | Portuguese     · text in Spanish     · Spanish ( pdf )