<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0012-7353</journal-id>
<journal-title><![CDATA[DYNA]]></journal-title>
<abbrev-journal-title><![CDATA[Dyna rev.fac.nac.minas]]></abbrev-journal-title>
<issn>0012-7353</issn>
<publisher>
<publisher-name><![CDATA[Universidad Nacional de Colombia]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0012-73532012000300023</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[DESIGN AND DEVELOPMENT OF A SPEECH SYNTHESIS SOFTWARE FOR COLOMBIAN SPANISH APPLIED TO COMMUNICATION THROUGH MOBILE DEVICES]]></article-title>
<article-title xml:lang="es"><![CDATA[DISEÑO Y DESARROLLO DE UN SOFTWARE DE SÍNTESIS DE VOZ PARA EL ESPAÑOL DE COLOMBIA APLICADO A LA COMUNICACIÓN A TRAVÉS DE DISPOSITIVOS MÓVILES]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[RUEDA CH.]]></surname>
<given-names><![CDATA[HOOVER F.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[CORREA P.]]></surname>
<given-names><![CDATA[CLAUDIA V.]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[ARGUELLO FUENTES]]></surname>
<given-names><![CDATA[HENRY]]></given-names>
</name>
<xref ref-type="aff" rid="A03"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[Bucaramanga ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[Bucaramanga ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="A03">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[Bucaramanga ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2012</year>
</pub-date>
<volume>79</volume>
<numero>173</numero>
<fpage>71</fpage>
<lpage>80</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0012-73532012000300023&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0012-73532012000300023&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0012-73532012000300023&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[In several scenarios of everyday life, there is a need to communicate orally with other people. However, various technological solutions such as mobile phones cannot be used in places such as meetings, classrooms, or conference rooms without disrupting the activities of people around the speaker. This research develops a tool that enables people to establish a conversation in a public place without disrupting the surrounding environment. To this end, a speech synthesizer is implemented on a personal computer connected to a cell phone, which allows one to establish a mobile call without using the human voice. The speech synthesizer uses the diphone concatenation technique and is developed specifically for the Spanish from Colombia. A mathematical description of the synthesizer shows the decomposition of the synthesizer into various mutually independent processes. Several user-acceptance and quality tests of the obtained signal were performed to evaluate the performance of the tool. The results show a high signal to noise ratio of generated signals and a high intelligibility of the tool.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[En diversos escenarios de la vida cotidiana existe la necesidad de comunicarse oralmente con otras personas. Sin embargo, diversas soluciones tecnológicas como la telefonía móvil no pueden ser utilizadas en lugares como reuniones, salones de clase, conferencias, entre otras, sin interrumpir las actividades de las personas alrededor del hablante. Este trabajo de investigación desarrolla una herramienta que permite entablar una conversación de voz en un recinto público sin interrumpir las actividades del medio circundante. Para ello se implementa un sintetizador de voz en una computadora personal comunicada de forma alámbrica con un teléfono móvil, lo cual permite establecer una llamada sin utilizar la voz humana. El sintetizador de voz utiliza la técnica de concatenación de difonemas y es desarrollado específicamente para el idioma español de Colombia. Una descripción matemática del sintetizador muestra su descomposición en diversos procesos independientes entre sí. Se realizaron diversas pruebas de aceptación de usuarios y de calidad de la señal obtenida para evaluar el desempeño de la herramienta. Los resultados muestran una alta relación señal a ruido de las señales generadas y una alta inteligibilidad de la herramienta.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[speech synthesis]]></kwd>
<kwd lng="en"><![CDATA[voice corpus]]></kwd>
<kwd lng="en"><![CDATA[diphone concatenation]]></kwd>
<kwd lng="en"><![CDATA[Spanish from Colombia]]></kwd>
<kwd lng="en"><![CDATA[mobile devices]]></kwd>
<kwd lng="en"><![CDATA[algorithms]]></kwd>
<kwd lng="es"><![CDATA[síntesis de voz]]></kwd>
<kwd lng="es"><![CDATA[corpus de voz]]></kwd>
<kwd lng="es"><![CDATA[concatenación de difonemas]]></kwd>
<kwd lng="es"><![CDATA[español de Colombia]]></kwd>
<kwd lng="es"><![CDATA[telefonía móvil]]></kwd>
<kwd lng="es"><![CDATA[algoritmos]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="center"><font size="4" face="Verdana, Arial, Helvetica, sans-serif"><b>DESIGN AND DEVELOPMENT OF A SPEECH SYNTHESIS SOFTWARE FOR COLOMBIAN SPANISH APPLIED TO COMMUNICATION THROUGH MOBILE DEVICES</b></font></p>     <p align="center"><i><font size="3"><b><font face="Verdana, Arial, Helvetica, sans-serif">DISE&Ntilde;O Y DESARROLLO DE UN SOFTWARE DE S&Iacute;NTESIS DE VOZ PARA EL ESPA&Ntilde;OL DE COLOMBIA APLICADO A LA COMUNICACI&Oacute;N A TRAV&Eacute;S DE DISPOSITIVOS M&Oacute;VILES</font></b></font></i></p>     <p align="center">&nbsp;</p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>HOOVER F. RUEDA CH.</b>    <br>   <i>B.Sc, Master (c), Universidad Industrial de Santander, Bucaramanga, Colombia, <a href="mailto:hoover.rueda@correo.uis.edu.co">hoover.rueda@correo.uis.edu.co</a></i></font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>CLAUDIA V. CORREA P.</b>    <br>   <i>B.Sc, Master (c), Universidad Industrial de Santander, Bucaramanga, Colombia, <a href="mailto:claudia.correa@correo.uis.edu.co">claudia.correa@correo.uis.edu.co</a></i></font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>HENRY ARGUELLO FUENTES</b>    <br>   <i>B.Sc, Ph.D. (c), Universidad Industrial de Santander, Bucaramanga, Colombia, <a href="mailto:henarfu@uis.edu.co">henarfu@uis.edu.co</a></i></font></p>     <p align="center">&nbsp;</p>     ]]></body>
<body><![CDATA[<p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Received for review: July 16<sup>th</sup>, 2011, accepted: February 06<sup>th</sup>, 2012, final version: April 23<sup>th</sup>, 2012</b></font></p>     <p align="center">&nbsp;</p> <hr>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>ABSTRACT:</b> In several scenarios of everyday life, there is a need to communicate orally with other people. However, various technological solutions such as mobile phones cannot be used in places such as meetings, classrooms, or conference rooms without disrupting the activities of people around the speaker. This research develops a tool that enables people to establish a conversation in a public place without disrupting the surrounding environment. To this end, a speech synthesizer is implemented on a personal computer connected to a cell phone, which allows one to establish a mobile call without using the human voice. The speech synthesizer uses the diphone concatenation technique and is developed specifically for the Spanish from Colombia. A mathematical description of the synthesizer shows the decomposition of the synthesizer into various mutually independent processes. Several user-acceptance and quality tests of the obtained signal were performed to evaluate the performance of the tool. The results show a high signal to noise ratio of generated signals and a high intelligibility of the tool. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>KEYWORDS: </b>speech synthesis, voice corpus, diphone concatenation, Spanish from Colombia, mobile devices, algorithms</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>RESUMEN:</b> En diversos escenarios de la vida cotidiana existe la necesidad de comunicarse oralmente con otras personas. Sin embargo, diversas soluciones tecnol&oacute;gicas como la telefon&iacute;a m&oacute;vil no pueden ser utilizadas en lugares como reuniones, salones de clase, conferencias, entre otras, sin interrumpir las actividades de las personas alrededor del hablante. Este trabajo de investigaci&oacute;n desarrolla una herramienta que permite entablar una conversaci&oacute;n de voz en un recinto p&uacute;blico sin interrumpir las actividades del medio circundante. Para ello se implementa un sintetizador de voz en una computadora personal comunicada de forma al&aacute;mbrica con un tel&eacute;fono m&oacute;vil, lo cual permite establecer una llamada sin utilizar la voz humana. El sintetizador de voz utiliza la t&eacute;cnica de concatenaci&oacute;n de difonemas y es desarrollado espec&iacute;ficamente para el idioma espa&ntilde;ol de Colombia. Una descripci&oacute;n matem&aacute;tica del sintetizador muestra su descomposici&oacute;n en diversos procesos independientes entre s&iacute;. Se realizaron diversas pruebas de aceptaci&oacute;n de usuarios y de calidad de la se&ntilde;al obtenida para evaluar el desempe&ntilde;o de la herramienta. Los resultados muestran una alta relaci&oacute;n se&ntilde;al a ruido de las se&ntilde;ales generadas y una alta inteligibilidad de la herramienta.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>PALABRAS CLAVE:</b> s&iacute;ntesis de voz, corpus de voz, concatenaci&oacute;n de difonemas, espa&ntilde;ol de Colombia, telefon&iacute;a m&oacute;vil, algoritmos</font></p> <hr>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>1. INTRODUCTION</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Speech synthesis is the generation of artificial voice from a written text [1,2]. Electronics and software generate acoustic signals to simulate the human voice [3&ndash;6]. Each language has its proper phonetic rules to determine the correct pronunciation of the words. Particularly, in Spanish, pronunciation is similar to what is written, but there are some special structures of the language that require special processing [7,8].</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Those structures include e-mail accounts, dates, abbreviations, and phone numbers. This situation is one of the biggest challenges in text-to-speech conversion. This is why different stages need to be taken into account in a speech synthesis system. First, a pre-processing stage analyzes the structures present in the text. Then, the text is divided into many entries for the synthesizer. This process is done by algorithms that apply the rules of the language and identify the separation of words (blank spaces, punctuation marks, written accents, etc.).</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">There are different parameters for measuring the quality of speech synthesis applications: the naturalness and intelligibility of the voice, the complexity of the process, and the domain for which it was developed [9]. Different techniques for speech synthesis have been developed, each one offering benefits in terms of naturalness or intelligibility compared to others. Some of them, such as synthesis by concatenation, use pre-recorded tokens of voice stored in a database called voice corpus [10&ndash;16]; other techniques are based on acoustic mathematical models that generate the artificial voice by the variation of parameters like noise levels, frequency, and the movements of the vocal apparatus [17&ndash;21].</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This paper presents the design of a software tool that allows for one to make mobile-phone calls by using speech synthesis, specifically diphone concatenation to generate an artificial voice on a computer from an input text and to reproduce it on a mobile device. This software is a solution for people having trouble answering their mobile devices due to situations that limit the direct use of speech. <a href="#fig01">Figure 1</a> presents a general diagram of the components in the proposed software.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig01"></a><img src="/img/revistas/dyna/v79n173/a23fig01.gif">    <br>   Figure 1.</b> General design and components of the proposed speech synthesis software. Speaker 1 and 2 are in different geographical places. Since Speaker 2 cannot use his voice (he is in a classroom), he can communicate with Speaker 1 by using the speech synthesis software.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Each section of this paper presents one component of the speech synthesis software tool. The first section is related to the speech synthesizer and the mathematical approach of each of its processors. The second section presents the voice corpus (voice database). The transmission device used to transmit the voice to the mobile device is presented in the third section. Finally, the tests are performed [26], the results obtained, and conclusions are presented.</font></p>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>2. SPEECH SYNTHESIZER</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The principal component of the developed software is an unlimited domain speech synthesizer that uses the diphone concatenation technique. The synthesizer produces a synthetic voice by processing an input text. In other words, it finds the sound representation of a given text. The synthesizer consists of six processors, each one developing a specific task in the synthesis process. <a href="#fig02">Figure 2</a> shows the architecture of the synthesizer. Note that the processors are executed sequentially. Thus, the output of each processor is the input of the next one.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig02"></a><img src="/img/revistas/dyna/v79n173/a23fig02.gif">    <br>   Figure 2.</b> Architecture of the proposed speech synthesizer. A phrase is the input of the system. The tokenizer divides it into tokens using the blank spaces between words; each token is normalized to be represented in words; then, each word is divided into phonemes, which are then grouped by the phoneme joiner to form the diphones; finally, diphone mapping is performed in the voice corpus to extract the audio files, concatenate them, and obtain the synthetic voice.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The mathematical approach, the description of the speech synthesizer, and each processor are presented below. Let <img src="/img/revistas/dyna/v79n173/a23eq48713.jpeg" /> be the alphabet that contains the Colombian Spanish letters and numbers, the Greek symbols, punctuation marks, mathematical operators, and other special symbols. Define <img src="/img/revistas/dyna/v79n173/a23eq48721.jpeg" />as the set of all the words of finite length formed with elements of <img src="/img/revistas/dyna/v79n173/a23eq48729.jpeg" />. Let <img src="/img/revistas/dyna/v79n173/a23eq48742.jpeg" /> be the set of punctuation marks named separators, which are given by <img src="/img/revistas/dyna/v79n173/a23eq48749.jpeg" />, where<img src="/img/revistas/dyna/v79n173/a23eq48756.jpeg" />.<img src="/img/revistas/dyna/v79n173/a23eq48763.jpeg" /></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.1. Tokenizer    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">First, the input phrase is processed by the tokenizer which divides the phrase into tokens. The blank space <img src="/img/revistas/dyna/v79n173/a23eq48779.jpeg" />is the element that indicates the end of one token and the beginning of the next one. Define the phrase <img src="/img/revistas/dyna/v79n173/a23eq48788.jpeg" /> as a sequence of symbols <img src="/img/revistas/dyna/v79n173/a23eq48797.jpeg" /> where <img src="/img/revistas/dyna/v79n173/a23eq48804.jpeg" /> and <img src="/img/revistas/dyna/v79n173/a23eq48811.jpeg" />. A token<img src="/img/revistas/dyna/v79n173/a23eq48819.jpeg" />, such that<img src="/img/revistas/dyna/v79n173/a23eq48827.jpeg" />, is defined as a sequence of symbols<img src="/img/revistas/dyna/v79n173/a23eq48835.jpeg" />, where <img src="/img/revistas/dyna/v79n173/a23eq48842.jpeg" /> and <img src="/img/revistas/dyna/v79n173/a23eq48849.jpeg" /> . Then <img src="/img/revistas/dyna/v79n173/a23eq48860.jpeg" /> is the set of <img src="/img/revistas/dyna/v79n173/a23eq48870.jpeg" /> detached by<img src="/img/revistas/dyna/v79n173/a23eq48880.jpeg" />. The function<img src="/img/revistas/dyna/v79n173/a23eq48889.jpeg" />, called Tokenizer, is defined by the equations</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq0103.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq48920.jpeg" /> is the input phrase, <img src="/img/revistas/dyna/v79n173/a23eq48927.jpeg" /> is the set of tokens in <img src="/img/revistas/dyna/v79n173/a23eq48934.jpeg" />, <img src="/img/revistas/dyna/v79n173/a23eq48941.jpeg" />, and <img src="/img/revistas/dyna/v79n173/a23eq48952.jpeg" /> is a function that finds the blank spaces in <img src="/img/revistas/dyna/v79n173/a23eq48962.jpeg" />.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.2. Normalizer    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The developed synthesizer is classified as unlimited domain [9]. For this reason, it is necessary that it identify different types of special constructions of the language such as numbers, dates, time, phone numbers, e-mail, and web pages. These constructions have a different pronunciation compared to their written representation. For that reason, the normalizer identifies the type of construction that corresponds to the input text and defines the way it is going to be pronounced.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Let <img src="/img/revistas/dyna/v79n173/a23eq49000.jpeg" /> be the alphabet with the letters of Colombian Spanish, such that <img src="/img/revistas/dyna/v79n173/a23eq49008.jpeg" />. Let <img src="/img/revistas/dyna/v79n173/a23eq49015.jpeg" /> be the set of words of finite length formed with elements of <img src="/img/revistas/dyna/v79n173/a23eq49022.jpeg" />. Define <img src="/img/revistas/dyna/v79n173/a23eq49031.jpeg" /> as the set of words in Colombian Spanish. The Normalizer is represented by a function <img src="/img/revistas/dyna/v79n173/a23eq49040.jpeg" /> given by the equations</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq0405.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49061.jpeg" /> is the set of tokens in <img src="/img/revistas/dyna/v79n173/a23eq49072.jpeg" />, <img src="/img/revistas/dyna/v79n173/a23eq49082.jpeg" /> is the set of words that correspond to the tokens in <img src="/img/revistas/dyna/v79n173/a23eq49093.jpeg" /> , and <img src="/img/revistas/dyna/v79n173/a23eq49101.jpeg" /> represents the number of elements in a vector. The normalizer uses a set of 16 pre-defined formats to perform the classification of the special constructions of the language. The formats are represented using regular expressions.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.3. Word Splitter    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The normalized words in phrase <img src="/img/revistas/dyna/v79n173/a23eq49123.jpeg" /> are used as the input of the word splitter. This processor divides each word into its corresponding phonemes. Let <img src="/img/revistas/dyna/v79n173/a23eq49130.jpeg" /> be the set of written representations of the phonemes in Colombian Spanish, which are presented in <a href="#tab01">Table 1</a>.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="tab01"></a>Table 1.</b> Phonemes of Colombian Spanish and letters that produce them. 28 phonemes are presented, including vowels with and without an accent, and assuming that the pairs &quot;b, v&quot;, and &quot;y, ll&quot; have the same pronunciation</font>    <br>   <img src="/img/revistas/dyna/v79n173/a23tab01.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Let <img src="/img/revistas/dyna/v79n173/a23eq49161.jpeg" /> be a function called Word Splitter defined by the equation,</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq06.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49177.jpeg" /> is the set of phonemes that correspond to each word. The function <img src="/img/revistas/dyna/v79n173/a23eq49184.jpeg" /> uses a word processing algorithm based on the location of each letter in the word and the neighboring letters. It assigns the phonemes that correspond to each letter based on those criteria. <a href="#tab02">Table 2</a> presents a portion of the conditions for assigning a phoneme to a letter.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="tab02"></a>Table 2. </b>Portion of the table of conditions for assigning phonemes to letters. Assigning a phoneme to a letter depends on its location in the word and its neighboring letters.</font>    <br>   <img src="/img/revistas/dyna/v79n173/a23tab02.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The output of the word splitter is represented by the equation</font></p>     ]]></body>
<body><![CDATA[<p><img src="/img/revistas/dyna/v79n173/a23eq07.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49234.jpeg" /> represents the 'pause' phoneme (blank space between words) and <img src="/img/revistas/dyna/v79n173/a23eq49244.jpeg" /> are the phonemes of the respective word <img src="/img/revistas/dyna/v79n173/a23eq49252.jpeg" />.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.4. Phoneme Joiner    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This processor takes the list of phonemes in a phrase and obtains its representation in diphones. Let <img src="/img/revistas/dyna/v79n173/a23eq49265.jpeg" /> be the set of all possible combinations of elements in <img src="/img/revistas/dyna/v79n173/a23eq49272.jpeg" /> (diphones, triphones, etc.). Define <img src="/img/revistas/dyna/v79n173/a23eq49280.jpeg" /> as the set of diphones in Colombian Spanish (a portion is presented in <a href="#tab03">Table 3</a>). Also, define the function <img src="/img/revistas/dyna/v79n173/a23eq49289.jpeg" /> as the Phoneme Joiner which is given by</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq08.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49303.jpeg" /> represents the set of diphones that correspond to each word of <img src="/img/revistas/dyna/v79n173/a23eq49310.jpeg" />. The function <img src="/img/revistas/dyna/v79n173/a23eq49321.jpeg" /> uses an algorithm for concatenating two consecutive phonemes, which can be written as</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq09.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49349.jpeg" /> corresponds to the phoneme in location <img src="/img/revistas/dyna/v79n173/a23eq49356.jpeg" /> of the list of phonemes <img src="/img/revistas/dyna/v79n173/a23eq49363.jpeg" />.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="tab03"></a>Table 3. </b>Portion of the diphones matrix. The rows and columns correspond to the identified phonemes, including the phoneme &quot;pau&quot;, which represents the blank space and is denoted by &quot;_&quot;. When a diphone does not exist, a dash is presented. The total number of diphones is 590.</font>    <br>   <img src="/img/revistas/dyna/v79n173/a23tab03.gif"></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.5. Finder    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Based on the list of diphones that represent phrase <img src="/img/revistas/dyna/v79n173/a23eq49387.jpeg" />, the finder connects the synthesizer with the voice corpus (database) and links each diphone with its sound. Define <img src="/img/revistas/dyna/v79n173/a23eq49396.jpeg" /> as the voice corpus of diphones containing the set of sound representations of the elements in <img src="/img/revistas/dyna/v79n173/a23eq49403.jpeg" />. Also, define function <img src="/img/revistas/dyna/v79n173/a23eq49410.jpeg" /> as</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq10.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49430.jpeg" /> is the set of sound representations that correspond to the search in <img src="/img/revistas/dyna/v79n173/a23eq49440.jpeg" /> of the set of diphones <img src="/img/revistas/dyna/v79n173/a23eq49450.jpeg" />. In this way, the representation of the input phrase in terms of the audio files matched to the diphones is obtained. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>2.6. Concatenator    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The concatenator is the last processor of the synthesizer. It generates the output audio signal. This task is performed using the list of audio files obtained in the Finder. Define the function called Concatenator which is represented by</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq11.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49482.jpeg" /> represents the audio signal obtained at the end of the synthesis process and <img src="/img/revistas/dyna/v79n173/a23eq49490.jpeg" /> represents the diphone (sound unit) in location <img src="/img/revistas/dyna/v79n173/a23eq49500.jpeg" /> of the set of sound representations <img src="/img/revistas/dyna/v79n173/a23eq49507.jpeg" />. In this way, it is possible to reproduce a signal that contains all the input text represented in sound units.</font></p>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>3. VOICE CORPUS</b></font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Unit concatenation speech synthesis requires a database from which the audio units are extracted to form the synthetic voice. The database is called a &quot;corpus&quot; and includes labeled phonetic units [22&ndash;25]. The data stored in the corpus corresponds to audio files recorded previously by a natural speaker and depend on the selected units for the synthesizer. In this case, the corpus has 590 audio files with the diphones of Colombian Spanish.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A matrix was developed for the identification of the diphones. The number of rows and columns correspond to the identified phonemes, including the phoneme &quot;pau&quot; (which represents the blank space). The matrix has 29 rows and 29 columns (a total of 841 diphones). Since not all the combinations of phonemes correspond to a real diphone in the Spanish language (for instance: &ntilde;-&ntilde;), the final number of diphones identified in this work is 590. <a href="#tab03">Table 3</a> presents a portion of the matrix developed for the identification of the diphones (a dash indicates that the diphone does not exist).</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">All possible combinations of phonemes in Colombian Spanish were tested to determine if a diphone is valid or not. Then the diphones were recorded to obtain the voice corpus. The block-diagram of the process in the development of the voice corpus, after identifying the diphones, is presented in <a href="#fig03">Fig. 3</a>.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig03"></a><img src="/img/revistas/dyna/v79n173/a23fig03.gif">    <br>   Figure 3.</b> Stages in the development of the voice corpus. First, phrases containing each of the diphones were recorded. Then, the beginning and the end of the diphones in the audio files were labeled. Finally, the diphones were extracted in individual files and stored in the voice corpus.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The previous stages were performed sequentially. First, phrases containing the diphones at least once were recorded. Then, the phrases were labeled to extract and store each diphone in the voice corpus.</font></p>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>4. TRANSMISSION DEVICE</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">As was shown in <a href="#fig01">Fig. 1</a>, the developed application runs on a computer (laptop or desktop). The audio signal is generated there by the synthesizer and needs to be transmitted to the mobile phone during a call. For that reason, a transmission device is required to send the synthetic voice from the computer to the mobile phone. In other words, the synthesizer will speak for the user during the call. That means that the transmission device has to be connected to the microphone of the mobile phone and, at the same time, must allow the person to use the earpiece. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Since there is no such commercial device, a hardware piece with the above features had to be designed. The principal element of the device designed is the headset that comes with almost every mobile phone model. This cable can access both the microphone and the earpiece of the phone. However, this characteristic does not allow transmission between the mobile phone and the computer. For that reason, this is the principal element to modify. </font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><a href="#fig04">Fig. 4.a</a>) presents a common headset cable. It has an earphone connected to the earpiece of the mobile phone, a microphone connected to the microphone of the phone, and two cables through which the signals are transmitted. The proposed transmission device consists of connecting the cable from the microphone of the headset to the audio output of the computer. Hence the synthetic voice is sent to the microphone of the mobile phone. The cable obtained is shown in <a href="#fig04">Fig. 4b</a>).</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig04"></a><img src="/img/revistas/dyna/v79n173/a23fig04.gif">    <br>   Figure 4.</b> a) Original headset b) Modified headset</font></p>     <p>&nbsp;</p>     <p><font size="3"><b><font face="Verdana, Arial, Helvetica, sans-serif">5. TESTS AND RESULTS</font></b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">In our previous work [27], the performance of the software tool was evaluated mainly in response time and its execution in other operative systems. In this paper, the quality of the voice is evaluated by using performance measures such as peak signal-to-noise ratio (PSNR), percentage of correct pronounced words, and intelligibility.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>5.1. Evaluation of the output of the synthesizer    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A comparison was made between the output of the synthesizer and the same phrases recorded by a person. The objective was to obtain a quantitative measure of the quality of the synthetic voice. The chosen measure is the PSNR, because it provides a sense of the behavior of the synthetic voice compared to the natural voice. The calculation was performed using the equation</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq12.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where O represents the original signal, S the output signal of the synthesizer (synthetic), max is a function that calculates the maximum value of the signal, and n is the number of values.</font></p>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A total of 70 phrases were used for the test, including all of the special language constructions mentioned in Section 1.2. <a href="#fig05">Figure 5</a> presents the average PSNR for each phrase. In general, the average PSNR for the phrases was 56.68 dB. The results show the high correlation between the synthetic signal and phrases pronounced by the human voice.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig05"></a><img src="/img/revistas/dyna/v79n173/a23fig05.gif">    <br>   Figure 5. </b>Results of the comparison between the original voice with the output of the synthesizer. Values of PSNR for the 70 phrases are between 50 and 62 dB, which shows high correlation between the signals.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Also, a frequency analysis was performed. A comparison between the spectrum of the original and the synthetic voices is shown in <a href="#fig06">Fig. 6</a>. It can be seen that the spectrum of both signals is similar. Since the number of samples and the amplitude of the synthetic signal are higher than those of the original signal, there are some differences between their spectra. Processing the output signal of the synthesizer does not include filtering or the modulator to normalize the amplitudes.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig06"></a><img src="/img/revistas/dyna/v79n173/a23fig06.gif">    <br>   Figure 6.</b> Frequency comparison between a) Natural voice, b) Synthetic voice</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>5.2. User Testing    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">A test was designed for users to evaluate the performance of the software. The features included in the evaluation were the intelligibility of the voice, the correctness of the outputs according to the inputs for the synthesizer, and the transmission device.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>5.2.1. Intelligibility    <br>   </b>Users listened to ten phrases with a total of 181 words that were pronounced by the synthesizer. The phrases used in the test included some of the pre-defined formats (dates, abbreviations, etc.). The results of the test are divided into four categories according to the quantity of words correctly identified by the users. This is shown in <a href="#tab04">Table 4</a>. </font></p>     ]]></body>
<body><![CDATA[<p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="tab04"></a>Table 4.</b> Results of the intelligibility test with users. From 181 words pronounced, only one person identified between 163 and 167 words (&gt;90%), two people identified between 172 and 176 words (&gt;95.1%) and 17 identified between 176 and 181 (&gt;97.51%).</font>    <br>   <img src="/img/revistas/dyna/v79n173/a23tab04.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Only one person is classified in the first category. The minimum percentage of correctly identified words was 90.6% of the test words. Most of the users (95%) identified more than 95% of the testing words. <a href="#tab04">Table 4</a> presents the results for all the categories. <a href="#fig07">Figure 7</a> shows the statistics of the results.</font></p>     <p align="center"><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b><a name="fig07"></a><img src="/img/revistas/dyna/v79n173/a23fig07.gif">    <br>   Figure 7.</b> Results of intelligibility test with users (percentages). 95% of the users identified more than 95% of the words pronounced by the synthesizer.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Based on the previous results, a percentage of intelligibility for the synthesizer was calculated. Taking into account that for a total of 20 users, 3620 words were pronounced by the speech synthesizer (181 words per person), the percentage of intelligibility is calculated by the equation</font></p>     <p><img src="/img/revistas/dyna/v79n173/a23eq13.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49635.jpeg" /> is the intelligibility percentage (between 1 to 100%), <img src="/img/revistas/dyna/v79n173/a23eq49643.jpeg" /> is the number of identified words and <img src="/img/revistas/dyna/v79n173/a23eq49652.jpeg" /> is the total number of pronounced words. By replacing the number of total words and the correctly identified words in Eq. (13), the resulting intelligibility percentage is 98% which denotes that the users can identify a high percentage of the words pronounced by the speech synthesizer.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>5.2.2. Synthesizer Output    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The synthesizer was tested by the users with a set of phrases randomly proposed by them. A total of 4153 words were used for the test. The users concluded that 36 of them were incorrectly pronounced by the synthesizer. The percentage of correctly pronounced words is calculated as</font></p>     ]]></body>
<body><![CDATA[<p><img src="/img/revistas/dyna/v79n173/a23eq14.gif"></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <img src="/img/revistas/dyna/v79n173/a23eq49673.jpeg" /> represents the percentage of words pronounced correctly and <img src="/img/revistas/dyna/v79n173/a23eq49680.jpeg" /> the percentage of words pronounced incorrectly. The results show that 99% of the words were pronounced correctly. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">These tests are useful for identifying failures in the processors of the synthesizer, for future improvement. The high percentage of correct words shows that the synthesizer can pronounce most of the words in Colombian Spanish when the pre-defined formats are used.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>5.2.3. Transmission Device    <br>   </b></font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Finally, users tested the software and the transmission device to answer a call in a mobile phone. All of the users (100%) said that they heard and understood the voice through the designed transmission device. That means that there was no perceptible loss in the signal during transmission and that the voice was intelligible.</font></p>     <p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b> 6. CONCLUSIONS</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This work integrates two technologies: speech synthesis and mobile phones. The implementation of this type of software allows users to answer phone calls on their mobile devices despite some limitations in the use of their voice. </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Studying speech synthesis techniques and the phonetics of the language allowed the development of a first prototype of unlimited-domain speech synthesizer based on diphone concatenation technique for Colombian Spanish including a voice corpus. Nonetheless, the tests performed show that future work should focus on the improvement of the quality of the synthetic voice, specifically in terms of naturality. More comprehensive studies on prosody and linguistics could lead to a more natural voice by adding new diphones.</font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The tests with users for the transmission device showed positive results. However, future work should investigate different wireless technologies, such as Bluetooth, for the transmission of the synthetic voice to the mobile phone. </font></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font size="3" face="Verdana, Arial, Helvetica, sans-serif"><b>ACKNOWLEDGMENTS</b></font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">This work was sponsored by Vicerrector&iacute;a de Investigaci&oacute;n y Extensi&oacute;n of the Universidad Industrial de Santander, under the research project #5537.</font></p>     <p>&nbsp;</p>     <p><font face="Verdana, Arial, Helvetica, sans-serif"><b><font size="3">REFERENCES</font></b></font></p>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>[1]</b> Flores, L.; Vargas, A.; Olivier, A.; Kirschning, I; and Cervantes, O. &quot;S&iacute;ntesis en Espa&ntilde;ol Mexicano con el M&eacute;todo de Selecci&oacute;n de Unidades de Longitud Variable&quot;, ENC, 601-610, Sept. 2001.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000128&pid=S0012-7353201200030002300001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[2]</b> O'Shaughnessy, D. &quot;Modern Methods of Speech Synthesis&quot;, IEEE Circuits and Systems Magazine, Vol. 7, No. 3, pp. 6–23, 3rd Quarter 2007.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000129&pid=S0012-7353201200030002300002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[3]</b> Laboratorio de fon&eacute;tica ULA. &quot;Tutorial de fon&eacute;tica: S&iacute;ntesis de habla&quot;, Universidad de los Andes, M&eacute;rida, Venezuela, 2005.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000130&pid=S0012-7353201200030002300003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[4]</b> Black, A.; Lenzo, K. &quot;Flite: a small fast run-time synthesis engine&quot;, Proceeding of the 4th ISCA Workshop on Speech Synthesis, 2001.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000131&pid=S0012-7353201200030002300004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[5]</b> Black, A.; Taylor, P. &quot;The Festival Speech Synthesis System: system documentation&quot;, Technical Report HCRC/TR-83, Human Communications Research Centre, University of Edinburgh, Scotland, UK, January 1997.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000132&pid=S0012-7353201200030002300005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[6]</b> Pitrelli, J.; Bakis R.; Eide E.; Fernandez R.; Hamza W.; Picheny M. &quot;The IBM expressive text-to-speech synthesis system for American English&quot;, TSAP, Vol. 14, No. 4, pp. 1301 – 1312, Jul. 2006.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000133&pid=S0012-7353201200030002300006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[7]</b> &quot;Otros estudios sobre el espa&ntilde;ol de Colombia&quot;, Publicaciones del Instituto Caro y Cuervo, Santaf&eacute; de Bogot&aacute;, 2000, pp. 31.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000134&pid=S0012-7353201200030002300007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[8]</b> Mora, S.; Lozano, M.; Ram&iacute;rez, R.; Espejo, M. B.; Duarte, G. E. &quot;Caracterizaci&oacute;n l&eacute;xica de los dialectos del espa&ntilde;ol de Colombia seg&uacute;n el &quot;ALEC&quot;&quot;, Publicaciones del Instituto Caro y Cuervo, Bogot&aacute;, 2004, 325 p&aacute;gs.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000135&pid=S0012-7353201200030002300008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[9]</b> LListerri, J. &quot;La s&iacute;ntesis de habla&quot;, I Jornadas de Tecnolog&iacute;a del Habla, Departamento de Lengua Inglesa, Universidad de Sevilla – Departamento de Electr&oacute;nica y Tecnolog&iacute;a de Computadores, Universidad de Granada, Sevilla, Noviembre 7, 2000.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000136&pid=S0012-7353201200030002300009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[10]</b> Rodr&iacute;guez, M.; Mora, E. &quot;S&iacute;ntesis de voz en el dialecto venezolano por medio de la concatenaci&oacute;n de difonos&quot;, Revista Ciencia e Ingenier&iacute;a, Vol. 27 No. 1, 2006, pp. 17-24.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000137&pid=S0012-7353201200030002300010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[11]</b> Rodr&iacute;guez, M.; Mora E. &quot;Conversor texto a voz en el dialecto venezolano por medio de la concatenaci&oacute;n de difonos&quot;, Revista Ciencia e Ingenier&iacute;a, Vol. 27, No. 2, 2006, pp. 79-87.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000138&pid=S0012-7353201200030002300011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[12]</b> Iriondo, I.; Mart&iacute;, J.; Oliver, J.; Guaus, R.; Moure H. &quot;Hacia una s&iacute;ntesis concatenativa de alta calidad para aplicaciones de conversi&oacute;n texto-habla&quot;, Procesamiento del Lenguaje Natural, Sociedad Espa&ntilde;ola para el procesamiento del Lenguaje Natural, No. 25, Sept. 1999, pp. 109-113.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000139&pid=S0012-7353201200030002300012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[13]</b> Rodr&iacute;guez Banga, E.; Campillo D&iacute;az, F. &quot;Sistema de conversi&oacute;n texto-voz en lengua gallega basado en la selecci&oacute;n combinada de unidades ac&uacute;sticas y pros&oacute;dicas&quot;, Procesamiento del Lenguaje Natural, Sociedad Espa&ntilde;ola para el procesamiento del Lenguaje Natural, No. 29, 2002, pp. 153-158.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000140&pid=S0012-7353201200030002300013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[14]</b> Hunt, A. and Black, A. &quot;Unit selection in a concatenative speech synthesis system using a large speech database&quot;, Proceedings of ICASSP 96, Vol. 1, pp. 373-376, Atlanta, Georgia, 1996.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000141&pid=S0012-7353201200030002300014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[15]</b> Lewis, E. AND Tatham, M. &quot;Word and Syllable Concatenation in Text-To-Speech Synthesis&quot;, Sixth European Conference on Speech Communications and Technology, pp. 615-618, ESCA, September 1999.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000142&pid=S0012-7353201200030002300015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[16]</b> Zapata, C. And Carmona N. &quot;El experimento Mago de Oz y sus aplicaciones: una mirada retrospectiva&quot;, Revista Dyna, No. 151, pp. 125-135,2007.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000143&pid=S0012-7353201200030002300016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[17]</b> Guzm&aacute;n Arreola, M. A. &quot;Sintetizador de voz para la ense&ntilde;anza de la lectura a ni&ntilde;os mexicanos&quot;, Tesis Licenciatura Ingenier&iacute;a en Sistemas Computacionales, Departamento de Ingenier&iacute;a en Sistemas Computacionales, Escuela de Ingenier&iacute;a, Universidad de las Am&eacute;ricas Puebla, 2004.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000144&pid=S0012-7353201200030002300017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[18]</b> Barra-Chicote, R.; Yamagishi, J.; Montero, J. M.; King S.; Lufti, S.; Macias-Guarasa, J. &quot;Generaci&oacute;n de una voz sint&eacute;tica en castellano basada en HSMM para la evaluaci&oacute;n Albayz&iacute;n 2008: Conversi&oacute;n Texto a voz&quot;, V Jornadas en Tecnolog&iacute;a del Habla, Noviembre 2008, Bilbao, Espa&ntilde;a, pp. 115-118.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000145&pid=S0012-7353201200030002300018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[19]</b> Tokuda, K.; Masuko, T.; Miyazaki, N. and Kobayashi, T. &quot;Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling&quot;, IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 1, Phoenix, Arizona, USA, March pp. 15-19, 1999.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000146&pid=S0012-7353201200030002300019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[20]</b> Yoshimura, T. &quot;Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based Text-To-Speech systems&quot;, PhD dissertation, Nagoya Institute of Technology, 2002.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000147&pid=S0012-7353201200030002300020&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[21]</b> Tokuda, K.; Zen, H.; Black A. W. &quot;An HMM-based speech synthesis system applied to English&quot;, Proceedings IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sept. 2002.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000148&pid=S0012-7353201200030002300021&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[22]</b> Palacio Baus, K. S.; Auquilla Peralta J. V. &quot;Dise&ntilde;o e Implementaci&oacute;n de un Sistema de S&iacute;ntesis de Voz&quot;, Tesis Ingenier&iacute;a Electr&oacute;nica, Facultad de Ingenier&iacute;as, Carrera de Ingenier&iacute;a Electr&oacute;nica, Universidad Polit&eacute;cnica Salesiana, Cuenca, Ecuador, 2007.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000149&pid=S0012-7353201200030002300022&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[23]</b> Campbell, N. &quot;Developments in Corpus-Based Speech Synthesis: Approaching Natural Conversational Speech&quot;, IEICE Trans. Inf. & Syst., Vol. E88–D, No. 3, March 2005.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000150&pid=S0012-7353201200030002300023&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[24]</b> LListerri, J.; Machuca, M. J.; De la mota, C.; Riera, M.; R&iacute;os A. &quot;Corpus orales para el desarrollo de las tecnolog&iacute;as del habla en espa&ntilde;ol&quot;, Oralia, An&aacute;lisis del discurso oral, Vol. 8, pp. 289-325, 2005.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000151&pid=S0012-7353201200030002300024&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[25]</b> Mora, E. &quot;Discapacidad y comunicaci&oacute;n: Una experiencia de fon&eacute;tica aplicada&quot;, EFE, ISSN 1575-5533, XVII, 2008, pp. 317-329.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000152&pid=S0012-7353201200030002300025&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[26]</b> Aguilar, L.; Fern&aacute;ndez J. M.; Garrido J. M.; LListerri J.; Macarr&oacute;n A.; Monz&oacute;n L.; Rodr&iacute;guez, M. A. &quot;Dise&ntilde;o de pruebas para la evaluaci&oacute;n de habla sintetizada en espa&ntilde;ol y su aplicaci&oacute;n a un sistema de conversi&oacute;n de texto a habla&quot;, en Actas del X Congreso de la Sociedad Espa&ntilde;ola para el Procesamiento del Lenguaje Natural, C&oacute;rdova, 20-22 de Julio de 1994.     &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000153&pid=S0012-7353201200030002300026&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><br>   <b>[27]</b> Correa, P.; Rueda, H.; Arguello, H. &quot;S&iacute;ntesis de voz por concatenaci&oacute;n de difonemas para el espa&ntilde;ol de Colombia&quot;, Revista Iberoamericana en Sistemas, Cibern&eacute;tica e Inform&aacute;tica, Vol. 7, No. 1, pp. 19-24, 2010. </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000154&pid=S0012-7353201200030002300027&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Flores]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Vargas]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Olivier]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Kirschning]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Cervantes]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
</person-group>
<source><![CDATA[Síntesis en Español Mexicano con el Método de Selección de Unidades de Longitud Variable]]></source>
<year>Sept</year>
<month>. </month>
<day>20</day>
<page-range>601-610</page-range><publisher-name><![CDATA[ENC]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[O'Shaughnessy]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Modern Methods of Speech Synthesis]]></article-title>
<source><![CDATA[IEEE Circuits and Systems Magazine]]></source>
<year>2007</year>
<volume>7</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>6-23</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="book">
<collab>Laboratorio de fonética ULA</collab>
<source><![CDATA[Tutorial de fonética:: Síntesis de habla]]></source>
<year>2005</year>
<publisher-loc><![CDATA[Mérida ]]></publisher-loc>
<publisher-name><![CDATA[Universidad de los Andes]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Black]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Lenzo]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Flite:: a small fast run-time synthesis engine"]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Proceeding of the 4th ISCA Workshop on Speech Synthesis]]></conf-name>
<conf-date>2001</conf-date>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Black]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Taylor]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[The Festival Speech Synthesis System:: system documentation", Technical Report HCRC/TR-83, Human Communications Research Centre]]></source>
<year>Janu</year>
<month>ar</month>
<day>y </day>
<publisher-loc><![CDATA[Scotland ]]></publisher-loc>
<publisher-name><![CDATA[University of Edinburgh]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Pitrelli]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Bakis]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Eide]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Fernandez]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Hamza]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Picheny]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The IBM expressive text-to-speech synthesis system for American English]]></article-title>
<source><![CDATA[TSAP]]></source>
<year>Jul.</year>
<month> 2</month>
<day>00</day>
<volume>14</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>1301-1312</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="book">
<source><![CDATA[Otros estudios sobre el español de Colombia]]></source>
<year>2000</year>
<publisher-loc><![CDATA[Santafé de Bogotá ]]></publisher-loc>
<publisher-name><![CDATA[Publicaciones del Instituto Caro y Cuervo]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mora]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Lozano]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Ramírez]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Espejo]]></surname>
<given-names><![CDATA[M. B.]]></given-names>
</name>
<name>
<surname><![CDATA[Duarte]]></surname>
<given-names><![CDATA[G. E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Caracterización léxica de los dialectos del español de Colombia según el "ALEC"]]></source>
<year>2004</year>
<publisher-loc><![CDATA[Bogotá ]]></publisher-loc>
<publisher-name><![CDATA[Publicaciones del Instituto Caro y Cuervo]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LListerri]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[La síntesis de habla]]></article-title>
<source><![CDATA[]]></source>
<year>Novi</year>
<month>em</month>
<day>br</day>
<conf-name><![CDATA[ I Jornadas de Tecnología del Habla, Departamento de Lengua Inglesa]]></conf-name>
<conf-loc> </conf-loc>
<publisher-loc><![CDATA[Sevilla ]]></publisher-loc>
<publisher-name><![CDATA[Universidad de Granada]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rodríguez]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Mora]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Síntesis de voz en el dialecto venezolano por medio de la concatenación de difonos]]></article-title>
<source><![CDATA[Revista Ciencia e Ingeniería]]></source>
<year>2006</year>
<volume>27</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>17-24</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rodríguez]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Mora]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Conversor texto a voz en el dialecto venezolano por medio de la concatenación de difonos]]></article-title>
<source><![CDATA[Revista Ciencia e Ingeniería]]></source>
<year>2006</year>
<volume>27</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>79-87</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Iriondo]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Martí]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Oliver]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Guaus]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Moure]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Hacia una síntesis concatenativa de alta calidad para aplicaciones de conversión texto-habla]]></article-title>
<source><![CDATA[Procesamiento del Lenguaje Natural]]></source>
<year>Sept</year>
<month>. </month>
<day>19</day>
<numero>25</numero>
<issue>25</issue>
<page-range>109-113</page-range><publisher-name><![CDATA[Sociedad Española para el procesamiento del Lenguaje Natural]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rodríguez Banga]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Campillo Díaz]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Sistema de conversión texto-voz en lengua gallega basado en la selección combinada de unidades acústicas y prosódicas]]></article-title>
<source><![CDATA[Procesamiento del Lenguaje Natural]]></source>
<year>2002</year>
<numero>29</numero>
<issue>29</issue>
<page-range>153-158</page-range><publisher-name><![CDATA[Sociedad Española para el procesamiento del Lenguaje Natural]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hunt]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Black]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Unit selection in a concatenative speech synthesis system using a large speech database]]></article-title>
<source><![CDATA[]]></source>
<year>1996</year>
<volume>1</volume>
<conf-name><![CDATA[ ICASSP 96]]></conf-name>
<conf-loc> </conf-loc>
<page-range>373-376</page-range><publisher-loc><![CDATA[Atlanta ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lewis]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Tatham]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Word and Syllable Concatenation in Text-To-Speech Synthesis]]></article-title>
<source><![CDATA[]]></source>
<year>Sept</year>
<month>em</month>
<day>be</day>
<conf-name><![CDATA[Sixth European Conference on Speech Communications and Technology]]></conf-name>
<conf-loc> </conf-loc>
<page-range>615-618</page-range><publisher-name><![CDATA[ESCA]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zapata]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Carmona]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[El experimento Mago de Oz y sus aplicaciones:: una mirada retrospectiva]]></article-title>
<source><![CDATA[Revista Dyna]]></source>
<year>2007</year>
<numero>151</numero>
<issue>151</issue>
<page-range>125-135</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Guzmán Arreola]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
</person-group>
<source><![CDATA[Sintetizador de voz para la enseñanza de la lectura a niños mexicanos]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Barra-Chicote]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Yamagishi]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Montero]]></surname>
<given-names><![CDATA[J. M.]]></given-names>
</name>
<name>
<surname><![CDATA[King]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Lufti]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Macias-Guarasa]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Generación de una voz sintética en castellano basada en HSMM para la evaluación Albayzín 2008:: Conversión Texto a voz]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[V Jornadas en Tecnología del Habla]]></conf-name>
<conf-date>Noviembre 2008</conf-date>
<conf-loc>Bilbao </conf-loc>
<page-range>115-118</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tokuda]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Masuko]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Miyazaki]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
<name>
<surname><![CDATA[Kobayashi]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Hidden Markov Models Based on Multi-Space Probability Distribution for Pitch Pattern Modeling]]></article-title>
<source><![CDATA[]]></source>
<year>1999</year>
<conf-name><![CDATA[1 IEEE International Conference on Acoustics, Speech and Signal Processing]]></conf-name>
<conf-loc>Phoenix Arizona</conf-loc>
<page-range>15-19</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>20</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yoshimura]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
</person-group>
<source><![CDATA[Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based Text-To-Speech systems]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B21">
<label>21</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tokuda]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Zen]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Black]]></surname>
<given-names><![CDATA[A. W.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An HMM-based speech synthesis system applied to English]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ IEEE 2002 Workshop on Speech Synthesis]]></conf-name>
<conf-date>Sept. 2002</conf-date>
<conf-loc>Santa Monica </conf-loc>
</nlm-citation>
</ref>
<ref id="B22">
<label>22</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Palacio Baus]]></surname>
<given-names><![CDATA[K. S.]]></given-names>
</name>
<name>
<surname><![CDATA[Auquilla Peralta]]></surname>
<given-names><![CDATA[J. V.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Diseño e Implementación de un Sistema de Síntesis de Voz]]></article-title>
<source><![CDATA[]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B23">
<label>23</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Campbell]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Developments in Corpus-Based Speech Synthesis:: Approaching Natural Conversational Speech]]></article-title>
<source><![CDATA[IEICE Trans. Inf. & Syst.]]></source>
<year>Marc</year>
<month>h </month>
<day>20</day>
<volume>E88</volume>
<numero>3</numero>
<issue>3</issue>
</nlm-citation>
</ref>
<ref id="B24">
<label>24</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LListerri]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Machuca]]></surname>
<given-names><![CDATA[M. J.]]></given-names>
</name>
<name>
<surname><![CDATA[De la mota]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Riera]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Ríos]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Corpus orales para el desarrollo de las tecnologías del habla en español]]></article-title>
<source><![CDATA[Oralia, Análisis del discurso oral]]></source>
<year>2005</year>
<volume>8</volume>
<page-range>289-325</page-range></nlm-citation>
</ref>
<ref id="B25">
<label>25</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mora]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Discapacidad y comunicación:: Una experiencia de fonética aplicada]]></article-title>
<source><![CDATA[]]></source>
<year>2008</year>
<numero>XVII</numero>
<issue>XVII</issue>
<page-range>317-329</page-range><publisher-name><![CDATA[EFE]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B26">
<label>26</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aguilar]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Fernández]]></surname>
<given-names><![CDATA[J. M.]]></given-names>
</name>
<name>
<surname><![CDATA[Garrido]]></surname>
<given-names><![CDATA[J. M.]]></given-names>
</name>
<name>
<surname><![CDATA[LListerri]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Macarrón]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Monzón]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Rodríguez]]></surname>
<given-names><![CDATA[M. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Diseño de pruebas para la evaluación de habla sintetizada en español y su aplicación a un sistema de conversión de texto a habla]]></article-title>
<source><![CDATA[]]></source>
<year></year>
<conf-name><![CDATA[ Actas del X Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural]]></conf-name>
<conf-date>20-22 de Julio de 1994</conf-date>
<conf-loc>Córdova </conf-loc>
</nlm-citation>
</ref>
<ref id="B27">
<label>27</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Correa]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Rueda]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Arguello]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Síntesis de voz por concatenación de difonemas para el español de Colombia]]></article-title>
<source><![CDATA[Revista Iberoamericana en Sistemas, Cibernética e Informática]]></source>
<year>2010</year>
<volume>7</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>19-24</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
