SciELO - Scientific Electronic Library Online

 
vol.22 número3Filters Design Method Based on Networks of Transmission Lines for a Single-Phase Topology with BPL TechnologyAutomatic Classification of Public Investment Megaprojects in Colombia from a Technical, Organizational and Environmental approach índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Em processo de indexaçãoCitado por Google
  • Não possue artigos similaresSimilares em SciELO
  • Em processo de indexaçãoSimilares em Google

Compartilhar


Ingeniería

versão impressa ISSN 0121-750X

Resumo

CELIS NUNEZ, Juan et al. Acoustic and Language Modeling for Speech Recognition of a Spanish Dialect from the Cucuta Colombian Region. ing. [online]. 2017, vol.22, n.3, pp.362-376. ISSN 0121-750X.  https://doi.org/10.14483/23448393.11616.

Context:

Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta’s dialect. Method: In this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.

Results:

We obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100% accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.

Conclusions:

The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.

Palavras-chave : Speech Recognition; acoustic models; language models; CMU Sphinx; Raspberry Pi.; Language: Spanish.

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )