<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0121-750X</journal-id>
<journal-title><![CDATA[Ingeniería]]></journal-title>
<abbrev-journal-title><![CDATA[ing.]]></abbrev-journal-title>
<issn>0121-750X</issn>
<publisher>
<publisher-name><![CDATA[Universidad Distrital Francisco José de Caldas]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0121-750X2017000300362</article-id>
<article-id pub-id-type="doi">10.14483/23448393.11616</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Modelo Acústico y de Lenguaje del Idioma Español para el dialecto Cucuteño, Orientado al Reconocimiento Automático del Habla]]></article-title>
<article-title xml:lang="en"><![CDATA[Acoustic and Language Modeling for Speech Recognition of a Spanish Dialect from the Cucuta Colombian Region]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Celis Núñez]]></surname>
<given-names><![CDATA[Juan]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Llanos Castro]]></surname>
<given-names><![CDATA[Rodrigo]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Medina Delgado]]></surname>
<given-names><![CDATA[Byron]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sepúlveda Mora]]></surname>
<given-names><![CDATA[Sergio]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Castro Casadiego*]]></surname>
<given-names><![CDATA[Sergio]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Universidad Francisco de Paula Santander Departamento de Electricidad y Electrónica. ]]></institution>
<addr-line><![CDATA[Bogota ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2017</year>
</pub-date>
<volume>22</volume>
<numero>3</numero>
<fpage>362</fpage>
<lpage>376</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0121-750X2017000300362&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0121-750X2017000300362&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0121-750X2017000300362&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[Resumen  Contexto:  El reconocimiento automático del habla requiere el desarrollo de modelos de lenguaje y modelos acusticos para los diferentes dialectos que existen. El objeto de esta investigacion es el entrenamiento de un modelo acustico, un modelo de lenguaje estadístico y un modelo de lenguaje gramatical para el idioma espahol, específicamente para el dialecto de la ciudad de San Jose de Ctucuta, Colombia, que pueda ser utilizado en un sistema de control por comandos. Lo anterior motivado por las deficiencias que presentan los modelos existentes para el idioma espadol, en el reconocimiento de la frecuencia fundamental y contenido espectral, el acento, la pronunciacioí n, el tono o simplemente al modelo de lenguaje de la variante dialectica de esta region.  Metodo:  Este proyecto utiliza el sistema embebido Raspberry Pi B+ con el sistema operativo Raspbian que es una distribucion de Linux y los softwares de codigo abierto CMU-Cambridge Statistical Language Modeling toolkit de la Universidad de Cambridge y CMU Sphinx de la Universidad Carnegie Mellon; los cuales se basan en los modelos ocultos de Markov para el caí lculo de los paraí metros de voz. Ademas, se utilizaron 1913 audios grabados por locutores de la ciudad de San Jose de Cicuta y el departamento de Norte de Santander para el entrenamiento y las pruebas del sistema de reconocimiento automaítico del habla.  Resultados:  Se obtuvo un modelo de lenguaje que consiste de dos archivos, uno de modelo de lenguaje estadístico (. lm), y uno de modelo gramatical (. jsgf). En relación con la parte acústica se entrenaron dos modelos, uno de ellos con una versión mejorada que obtuvo una tasa de acierto en el reconocimiento de comandos del 100 % en los datos de entrenamiento y de 83 % en las pruebas de audio. Por último, se elaboró un manual para la creación de los modelos acústicos y de lenguaje con el software CMU Sphinx.  Conclusiones:  El número de participantes en el proceso de entrenamiento de los modelos acústicos y de lenguaje influye significativamente en la calidad del procesamiento de voz del reconocedor. A fin de obtener una mejor respuesta del sistema de Reconocimiento Automático del Habla es importante usar un diccionario largo para la etapa de entrenamiento y un diccionario corto con las palabras de comando para la implementación del sistema. Teniendo en cuenta que en las pruebas de reconocimiento se obtuvo una tasa de éxito mayor al 80 % es posible usar los modelos creados en el desarrollo de un sistema de Reconocimiento Automático del Habla para una aplicación orientada a la asistencia de personas con discapacidad visual o incapacidad de movimiento]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract  Context:  Automatic speech recognition requires the development of language and acoustic models for different existing dialects. The purpose of this research is the training of an acoustic model, a statistical language model and a grammar language model for the Spanish language, specifically for the dialect of the city of San Jose de Cucuta, Colombia, that can be used in a command control system. Existing models for the Spanish language have problems in the recognition of the fundamental frequency and the spectral content, the accent, pronunciation, tone or simply the language model for Cucuta&#8217;s dialect. Method: In this project, we used Raspberry Pi B+ embedded system with Raspbian operating system which is a Linux distribution and two open source software, namely CMU-Cambridge Statistical Language Modeling Toolkit from the University of Cambridge and CMU Sphinx from Carnegie Mellon University; these software are based on Hidden Markov Models for the calculation of voice parameters. Besides, we used 1913 recorded audios with the voice of people from San Jose de Cucuta and Norte de Santander department. These audios were used for training and testing the automatic speech recognition system.  Results:  We obtained a language model that consists of two files, one is the statistical language model (.lm), and the other is the jsgf grammar model (.jsgf). Regarding the acoustic component, two models were trained, one of them with an improved version which had a 100% accuracy rate in the training results and 83 % accuracy rate in the audio tests for command recognition. Finally, we elaborated a manual for the creation of acoustic and language models with CMU Sphinx software.  Conclusions:  The number of participants in the training process of the language and acoustic models has a significant influence on the quality of the voice processing of the recognizer. The use of a large dictionary for the training process and a short dictionary with the command words for the implementation is important to get a better response of the automatic speech recognition system. Considering the accuracy rate above 80 % in the voice recognition tests, the proposed models are suitable for applications oriented to the assistance of visual or motion impairment people.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Speech Recognition]]></kwd>
<kwd lng="en"><![CDATA[acoustic models]]></kwd>
<kwd lng="en"><![CDATA[language models]]></kwd>
<kwd lng="en"><![CDATA[CMU Sphinx]]></kwd>
<kwd lng="en"><![CDATA[Raspberry Pi.]]></kwd>
<kwd lng="en"><![CDATA[Language: Spanish]]></kwd>
<kwd lng="es"><![CDATA[Reconocimiento del habla]]></kwd>
<kwd lng="es"><![CDATA[Modelos acusticos]]></kwd>
<kwd lng="es"><![CDATA[Modelos de lenguajes]]></kwd>
<kwd lng="es"><![CDATA[CMU Sphinx]]></kwd>
<kwd lng="es"><![CDATA[Raspberry Pi.]]></kwd>
<kwd lng="es"><![CDATA[Idioma: Español.]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>[1]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Moumtadi]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Granados-Lovera]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Delgado-Hernandez]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Activacion de funciones en edificios inteligentes utilizando comandos de voz desde dispositivos moviles]]></article-title>
<source><![CDATA[Ingeniería. Investigación y Tecnología]]></source>
<year>2014</year>
<page-range>175-86</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>[2]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alcubierre]]></surname>
<given-names><![CDATA[J.M.]]></given-names>
</name>
<name>
<surname><![CDATA[Minguez]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Montesano]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Montano]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Saz]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Lleida]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Silla de Ruedas Inteligente Controlada por Voz]]></source>
<year>2005</year>
<conf-name><![CDATA[ Primer Congreso Internacional de Domotica, Robotica y Teleasistencia para todos]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B3">
<label>[3]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[El Amrani]]></surname>
<given-names><![CDATA[M. Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Rahman]]></surname>
<given-names><![CDATA[M.M. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Wahiddin]]></surname>
<given-names><![CDATA[M. R.]]></given-names>
</name>
<name>
<surname><![CDATA[Shah]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Building CMU Sphinx Language Model for The Holy Quran using Simpl ified Arabic Phonemes]]></article-title>
<source><![CDATA[Egyptian Informatics Journal]]></source>
<year>2016</year>
<volume>17</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>305-14</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>[4]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Saqer]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Voice speech recognition using hidden Markov model Sphinx-4 for Arabic]]></source>
<year>2012</year>
<publisher-name><![CDATA[University of Houston-Clear Lake]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>[5]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Uebler]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Multilingual speech recognition in seven languages]]></article-title>
<source><![CDATA[Speech Communication]]></source>
<year>2001</year>
<volume>35</volume>
<numero>1-2</numero>
<issue>1-2</issue>
<page-range>53-69</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>[6]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kohler]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Multilingual phone models for vocabulary-independent speech recognition tasks]]></article-title>
<source><![CDATA[Speech Communication]]></source>
<year>2001</year>
<volume>35</volume>
<numero>1-2</numero>
<issue>1-2</issue>
<page-range>21-30</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>[7]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kepuska]]></surname>
<given-names><![CDATA[V. Z]]></given-names>
</name>
<name>
<surname><![CDATA[Rojanasthien]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Speech Corpus Generation from DVDs of Movies and TV Series]]></article-title>
<source><![CDATA[Journal of International Technology and Information Management]]></source>
<year>2011</year>
<volume>20</volume>
<numero>1-2</numero>
<issue>1-2</issue>
<page-range>49-82</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>[8]</label><nlm-citation citation-type="book">
<source><![CDATA[CMU Sphinx Project]]></source>
<year></year>
<publisher-name><![CDATA[Carnegie Mellon University]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>[9]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<source><![CDATA[Realization of Mandarin continuous digits speech recognition system using Sphinx]]></source>
<year>2010</year>
<conf-name><![CDATA[ International Symposium on Computer Communication Control and Automation (3CA)]]></conf-name>
<conf-date>2010</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B10">
<label>[10]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ceballos]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Serna-Morales]]></surname>
<given-names><![CDATA[A. F.]]></given-names>
</name>
<name>
<surname><![CDATA[Prieto]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
<name>
<surname><![CDATA[Gomez]]></surname>
<given-names><![CDATA[J. B.]]></given-names>
</name>
<name>
<surname><![CDATA[Redarce]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Sistema audiovisual para reconocimiento de comandos]]></article-title>
<source><![CDATA[Ingeniare: Revista Chilena de Ingeniería]]></source>
<year>2011</year>
<volume>19</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>278-91</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>[11]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ceballos]]></surname>
<given-names><![CDATA[A .]]></given-names>
</name>
</person-group>
<source><![CDATA[]]></source>
<year>2009</year>
<publisher-loc><![CDATA[Colombia ]]></publisher-loc>
<publisher-name><![CDATA[Universidad Nacional de Colombia, Sede Manizales]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B12">
<label>[12]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Calvo Arias]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
</person-group>
<source><![CDATA[Reconocimiento de voz]]></source>
<year>2002</year>
<publisher-name><![CDATA[Instituto Tecnologico de Costa Rica, Escuela de Ingeniería Electrónica]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<label>[13]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gamma]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Amaya Hurtado]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Sandoval]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Revisión de las tecnologías y aplicaciones del habla sub-vocal]]></article-title>
<source><![CDATA[Ingeniería]]></source>
<year></year>
<volume>20</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>277-88</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>[14]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Oberle]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Detection and estimation of acoustical signals using hidden Markov model]]></source>
<year>1999</year>
<publisher-loc><![CDATA[Zuerich, Switzerland ]]></publisher-loc>
<publisher-name><![CDATA[Eidgenoessische Technische Hochschule]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<label>[15]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Varela]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Cuayáhuitl]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Nolazco-Flores]]></surname>
<given-names><![CDATA[J. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Creating a Mexican Spanish version of the CMU SphinxIII speech recognition system]]></article-title>
<source><![CDATA[Progress in Pattern Recognition, Speech and Image Analysis]]></source>
<year>2003</year>
<page-range>251-8</page-range><publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>[16]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mingov]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Zdravevski]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Lameski]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Application of Russian Language Phonemics to Generate Macedonian Speech Recognition Model Using Sphinx]]></article-title>
<source><![CDATA[ICT Innovations 2016]]></source>
<year>2016</year>
</nlm-citation>
</ref>
<ref id="B17">
<label>[17]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lamere]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Kwok]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Gouv]]></surname>
<given-names><![CDATA[E. B.]]></given-names>
</name>
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Walker]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Wolf]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[The CMU sphinx-4 speech recognition system]]></source>
<year>2003</year>
<conf-name><![CDATA[ Conference on Acoustics, Speech and Signal Processing]]></conf-name>
<conf-loc>Hong Kong, </conf-loc>
</nlm-citation>
</ref>
<ref id="B18">
<label>[18]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Raab]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Gruhn]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Noeth]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[A scalable architecture for multilingual speech recognition on embedded devices]]></article-title>
<source><![CDATA[Speech Communication]]></source>
<year>2011</year>
<volume>53</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>62-74</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>[19]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Villasenor]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Montes]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Perez]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Váufreydáz]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Comparación léxica de corpus para generación de modelos de lenguaje]]></source>
<year>2002</year>
<conf-name><![CDATA[ workshop on Multilingual Information Access and Natural Language]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
