SciELO - Scientific Electronic Library Online

 
 número43Apoyando el despliegue de procesos en el contexto de las pequeñas organizaciones softwareModelo de referencia ágil y escalado para la industria de software índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Revista científica

versión impresa ISSN 0124-2253versión On-line ISSN 2344-8350

Resumen

GARCIA-CHICANGANA, David-Santiago et al. Multi-Client Document Classification Service Based on Machine Learning Techniques and Elasticsearch. Rev. Cient. [online]. 2022, n.43, pp.64-79.  Epub 18-Feb-2022. ISSN 0124-2253.  https://doi.org/10.14483/23448350.18352.

This paper presents a document classification service that allows multiple client (multi-tenant) document management systems to provide greater confidence and credibility regarding the document types assigned to documents uploaded by users. The research was carried out through the phases of CRISP-DM, where two document representation models were evaluated (bags of words with cumulative n-grams and BERT, which was recently proposed by Google) and five machine learning techniques (multilayer perceptron, random forests, k-nearest neighbors, decision trees, and naïve bayes). The experiments were carried out with data from two organizations, and the best results were obtained by multilayer perceptron, random forests, and k-nearest neighbors, which showed very similar results regarding general accuracy and recall by class. The results are not conclusive with respect to the ability to offer the service to multiple clients with a single model, since this also depends on their documents and document types. Therefore, a service is offered which is based on a microservices architecture that allows each organization to create its own model, monitor its performance in production, and update it when performance is not adequate.

Palabras clave : CRISP-DM; data analytics; document management system; k-nearest neighbors; multilayer perceptron; random forests; trigrams..

        · resumen en Español | Portugués     · texto en Español     · Español ( pdf )