Anomalies detection for big data

Torres-Domínguez, Omar; Sabater-Fernández, Samuel; Bravo-Ilisatigui, Lisandra; Martin-Rodríguez, Diana; García-Borroto, Milton

doi:10.19053/01211129.v28.n50.2019.8793

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Revista Facultad de Ingeniería

Print version ISSN 0121-1129

Abstract

TORRES-DOMINGUEZ, Omar et al. Anomalies detection for big data. Rev. Fac. ing. [online]. 2019, vol.28, n.50, pp.62-76. ISSN 0121-1129. https://doi.org/10.19053/01211129.v28.n50.2019.8793.

The development of the digital age has resulted in a considerable increase in data volumes. These large volumes of data have been called big data since they exceed the processing capacity of conventional database systems. Several sectors consider various opportunities and applications in the detection of anomalies in big data problems. This type of analysis can be very useful the use of data mining techniques because it allows extracting patterns and relationships from large amounts of data. The processing and analysis of these data volumes need tools capable of processing them as Apache Spark and Hadoop. These tools do not have specific algorithms for detecting anomalies. The general objective of the work is to develop a new algorithm for the detection of neighborhood-based anomalies in big data problems. From a comparative study, the KNNW algorithm was selected by its results, in order to design a big data variant. The implementation of the big data algorithm was done in the Apache Spark tool, using the parallel programming paradigm MapReduce. Subsequently different experiments were performed to analyze the behavior of the algorithm with different configurations. Within the experiments, the execution times and the quality of the results were compared between the sequential variant and the big data variant. Getting better results, the big data variant with significant difference. Getting the big data variant, KNNW-Big Data, can process large volumes of data.

Keywords : big data; data mining; detecting anomalies; MapReduce.

· abstract in Spanish | Portuguese · text in Spanish · Spanish (

pdf )