<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0120-6230</journal-id>
<journal-title><![CDATA[Revista Facultad de Ingeniería Universidad de Antioquia]]></journal-title>
<abbrev-journal-title><![CDATA[Rev.fac.ing.univ. Antioquia]]></abbrev-journal-title>
<issn>0120-6230</issn>
<publisher>
<publisher-name><![CDATA[Facultad de Ingeniería, Universidad de Antioquia]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0120-62302012000200006</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[FPGAs Implementation of fast algorithms oriented to mp3 audio decompression]]></article-title>
<article-title xml:lang="es"><![CDATA[Implementación en FPGAs de algoritmos rápidos para descompresión de audio en formato MP3]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Benavides]]></surname>
<given-names><![CDATA[Antonio]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Rentería]]></surname>
<given-names><![CDATA[Geovanni]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Bernal]]></surname>
<given-names><![CDATA[Álvaro]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad del Valle Microelectronic and Digital Architectures Group ]]></institution>
<addr-line><![CDATA[Cali ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2012</year>
</pub-date>
<numero>63</numero>
<fpage>55</fpage>
<lpage>68</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0120-62302012000200006&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0120-62302012000200006&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0120-62302012000200006&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[The high performance required by audio decompression algorithms demands robust processors, however, sometimes they are not efficient for optimal portable devices applications. This paper carries out an exploration of some algorithms whose hardware implementation allow to improve the performance of this type of customized processors when are applied to audio decompression tasks. Some experimental and comparative results are presented.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[La ejecución de los algoritmos de descompresión de audio exige procesadores potentes con alto nivel de desempeño, sin embargo, dichos algoritmos no son apropiados para aplicaciones óptimas en dispositivos móviles. En este trabajo se lleva a cabo una exploración de algunos algoritmos cuya implementación en hardware permite mejorar el desempeño de los procesadores usados en dispositivos móviles que ejecutan tareas de descompresión de audio. Se presentan algunos resultados experimentales y análisis comparativos.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[MP3]]></kwd>
<kwd lng="en"><![CDATA[floating point representation]]></kwd>
<kwd lng="en"><![CDATA[VHDL]]></kwd>
<kwd lng="en"><![CDATA[IMDCT]]></kwd>
<kwd lng="es"><![CDATA[MP3]]></kwd>
<kwd lng="es"><![CDATA[representation en punto flotante]]></kwd>
<kwd lng="es"><![CDATA[VHDL]]></kwd>
<kwd lng="es"><![CDATA[IMDCT]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p align="right"><b>ART&Iacute;CULO ORIGINAL</b></p>     <p align="right">&nbsp;</p>     <p align="center"><font size="4"> <b>FPGAs Implementation of fast algorithms oriented to mp3 audio decompression</b></font></p>     <p align="center">&nbsp;</p>     <p align="center"><font size="3"> <b>Implementaci&oacute;n en FPGAs de algoritmos r&aacute;pidos para descompresi&oacute;n de audio en formato MP3</b></font></p>     <p align="center">&nbsp;</p>     <p align="center">&nbsp;</p>     <p> <i><b>Antonio Benavides, Geovanni Renter&iacute;a, &Aacute;lvaro Bernal<sup>*</sup></b></i></p>       <p>Microelectronic  and Digital Architectures Group. Universidad del Valle. A. A. 25360. Cali, Colombia. </p>      ]]></body>
<body><![CDATA[<p><sup>*</sup>Autor de correspondencia: tel&eacute;fono:  + 57 + 2 + 330 34 36 ext. 113, fax: + 57 + 2 + 339  21 40 ext. 112, correo electr&oacute;nico: <a href="mailto:alvaro.bernal@correounivalle.edu.co">alvaro.bernal@correounivalle.edu.co</a> (A. Bernal)</p>     <p>&nbsp;</p>     <p align="center">(Recibido el 14 de septiembre de 2011. Aceptado el 18 de mayo de 2012)</p>     <p align="center">&nbsp;</p>     <p align="center">&nbsp;</p> <hr noshade size="1">      <p><font size="3"><b>Abstract</b></font></p>       <p>The  high performance required by audio decompression algorithms demands robust  processors, however, sometimes they are not efficient for optimal portable  devices applications. This paper carries out an exploration of some algorithms  whose hardware implementation allow to improve the performance of this type of  customized processors when are applied to audio decompression tasks. Some  experimental and comparative results are presented.</p>       <p><i>Keywords:</i> MP3, floating point  representation, VHDL, IMDCT</p>  <hr noshade size="1">      <p><font size="3"><b>Resumen</b></font></p>     <p>La ejecuci&oacute;n de los algoritmos  de descompresi&oacute;n de audio exige  procesadores potentes con alto nivel de desempe&ntilde;o, sin embargo, dichos  algoritmos no son apropiados para aplicaciones &oacute;ptimas en dispositivos m&oacute;viles.  En este trabajo se lleva a cabo una exploraci&oacute;n de algunos algoritmos cuya  implementaci&oacute;n en hardware permite mejorar el desempe&ntilde;o de los procesadores  usados en dispositivos m&oacute;viles que ejecutan tareas de descompresi&oacute;n de audio. Se presentan  algunos resultados experimentales y an&aacute;lisis comparativos.</p>      ]]></body>
<body><![CDATA[<p><i>Palabras clave: </i>MP3, representation en punto flotante, VHDL, IMDCT</p>  <hr noshade size="1">      <p>&nbsp;</p>     <p>&nbsp;</p>     <p><font size="3"><b>Introduction</b></font></p>      <p>Due to  its compression efficiency, the standard ISO MPEG is one of the audio  compression technique more widely used. The third layer of MPEG, normally known  as MP3, is extensively utilized in both digital audio diffusion and multimedia  applications, in consequence the CODEC MP3 is one of the most advanced MPEG  standard for digital audio compression. Nevertheless, it has a relatively high  computational complexity that difficult its implementation using  microprocessors of limited characteristics. Considering the microelectronic  possibilities, is important to explore not only different software applications  but also its hardware implementation regarding to improve performance and a  higher impact in the market. The high power calculation required by the algorithms  used in a MP3 decoder requires the high performance processors. In fact, many  commercial solutions utilize digital signal processors but in many occasions  this type of solutions is not optimum because of several external component are  required. This solution is not a more suitable for portable systems which  require both compact solutions and low power features. According to this, an  alternative consists in using cores-soft processors which can be implemented on  FPGAs. These processors demand a low power calculation allowing optimal  hardware implementation. In this article a quick review related to the theory  of the MP3 standard and the study of some fast algorithms which can be  implemented in hardware are presented. Finally, a case study an its implementation  in hardware using a VIRTEX2P card is described.</p>      <p><b><i>MP3 standard</i></b></p>        <p>MP3  compression algorithm is based on the limitations of the human ear, which is  capable of listening frequencies between 20hz and 20khz (is more sensitive  between 2 and 4 KHz). The algorithm eliminates the inaudible frequencies  conserving the essence of the sound. It is possible to select the level of  codification and compression desired when MP3 algorithm is used. So, to greater  compression, smaller quality. A good equilibrium between compression and  quality is obtained to 128Kbits/44khz stereo being the level for defect in the  compressors and in the available songs in the network.</p>        <p><i>MP3 Decoder</i></p>       <p>A decoder basically applies inverse transformed to setup  the audio signals to be listened. All the streaming are essentially processed  using the same technique. In <a href="#Figura1">figure 1</a>, a MP3 decoder block diagram is shown &#91;1 &#93;. This decoder generates the sequence of samples ofthe original sound from  the MP3 bit stream. This codification system is based on small packages or  streams where each one of them corresponds to sections of sound of a few  milliseconds duration. First, the synchronization block look for the synchronization  word in the stream, which defines the beginning of a valid MPEG stream. </p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06i01.gif" ><a name="Figura1"></a></p>        ]]></body>
<body><![CDATA[<p>Each stream contains both the  data of compressed audio and the information about how decode these data. This  synchronization word is part of the head, where information about the number of  layer implemented, the sampling state and the channels configuration are  stored. The head also contains information about the used binary state (bit  rate), from which, the length of the stream can be determined and therefore  informs when will appear the next synchronization word in the head. Besides of  the transformed data, the bit stream contains a series of collateral  information utilized to re-setup such audio samples. This first block is used  for obtaining the mentioned collaterals data. Among this additional information  it must be mentioned the scale factors and some look up tables utilized by the  Huffman decoder. The first ones, the frequency spectrum, is divided into a  series of bands which are affected by some scale factors. These bands are  defined by the sampling frequency and they correspond approximately to the  critical bands of the human ear. For each band exists a scale factor which will  be used to control the gain while the samples are dequantized.</p>        <p>The 576 samples codified using Huffman values are read and  decoded utilizing Huffman look up tables defined by alternate information. The  encoder would be able to utilize several Huffman look up tables in different  sampling regions. The Huffman tables have different ranks of number or bit  allocation. The values of the sampling blocks are in the rank: &#91;-8207 : 8207 &#93;,  but the values of the Huffman tables only represent pairs of values contents in  the rank &#91;0:15 &#93;. Some tables utilize 15 as a escape code. If the Huffman  decoder finds the escape code, it reads this code using the table dependent of  the number of bits and this value is added to 15. This number is known like  linbits. The number of linbits is between 1 and 13. In the dequantizer, the  samples of the bit stream are dequantized and scaled to appropriate values  using the scale factors and the grain gain. The values of the samples are 4/3  powering during the requantization process. In the reordering block, the  samples in the blocks that utilize short time windows (short blocks) are  reordered to be processed by the following steps. </p>     <p>The alias reconstruction unit acts in blocks that requires  long time windows in order to compensate the overlapped frequencies of the sub  band filter. Then, each subband is newly transformed to the time domain. For  the long blocks, a 36 points inverse modified discrete cosine transform (IMDCT)  calculates 36 output samples. For short blocks the three outputs of the 12  points IMDCT are combined they selves to form 36 output samples. Further the 18  samples are added to the values stored in the previous grain. These values are  the new output values. The next block tries to correct the frequency inversion  added by the subband filter, for doing that each second sample is multiplied  for -1. Finally, the 32 subband are combined as samples in the domain time in  order to cover all the spectrum frequency. A sample is taken of each subband and  is transformed using a discrete cosine transform (DCT). The result is written  in a FIFO bottom position. The PCM samples are then calculated through an  windowed operation inside of the FIFO. </p>        <p><b><i>Study of a fast algorithm for the inverse modified discrete cosine transform (IMDCT)</i></b></p>      <p>The  following implementation based on a variation of the quick algorithm published  by S. W. LEE &#91;2 &#93; is oriented to implement the IMDCT. The transform is described  by equation (1).</p>      <p><img src="img/revistas/rfiua/n63/n63a06e01.gif"></p>      <p>The  MP3 audio decompression involves a 36-point IMDCT and another one of 12-points.  So, for 36-point case N = 36, we have:</p>      <p><img src="img/revistas/rfiua/n63/n63a06e02.gif"></p>      <p>Due to  complexity of the equation its execution requires substantial processing  including a high number of multiplications, for that reason to find a fast algorithm  is mandatory. An alternative approach is described in &#91;3 &#93;, this algorithm is  based on permutations and simple operations over matrices. In that algorithm  X(n) is defined by (3)</p>      <p><img src="img/revistas/rfiua/n63/n63a06e03.gif"></p>      ]]></body>
<body><![CDATA[<p>The transformed vector after processing through a block IMDCT is:</p>      <p><img src="img/revistas/rfiua/n63/n63a06e04.gif"></p>      <p>The original and reverse transformed vector are related by (5).</p>      <p><img src="img/revistas/rfiua/n63/n63a06e05.gif"></p>      <p>T denotes the transposed operation over the vector or matrix. M is defined by equation 6.</p>      <p><img src="img/revistas/rfiua/n63/n63a06e06.gif"></p>      <p>The obtained results show that the vector y(k) is given by the equations 7 and 8.</p>      <p><img src="img/revistas/rfiua/n63/n63a06e07.gif"></p>      <p>This feature is important because it reduces the number of operations. So, knowing the terms on the right side it is possible to calculate ones of the left side. In the model W is defined as the vector that includes only part of the vector:</p>      <p><img src="img/revistas/rfiua/n63/n63a06e09.gif"></p>      ]]></body>
<body><![CDATA[<p>While the Y vector has 36 terms, the W vector only has 18 terms. Equation 10 relates the Y output vector and the W vector.</p>      <p><img src="img/revistas/rfiua/n63/n63a06e10.gif"></p>      <p>Where P is a matrix of 18 x 36 determined by</p>      <p><img src="img/revistas/rfiua/n63/n63a06e11.gif"></p>      <p>I9X9 is the 9x9 identity matrix and J is the 9x9 diagonal matrix defined by 12</p>      <p><img src="img/revistas/rfiua/n63/n63a06e12.gif"></p>      <p>Additionally, 18 points DCT  type IV denoted as <em>C<sup>IV</sup></em><sub>18</sub> (also as DCT-IV) is described  by (1) with N = 18. The DCT type IV  (<em>C</em><em><sup>IV</sup></em><sub>18</sub>) is a modification of  the SDCT II -(<em> C</em><em><sup>II</sup></em><sub>18</sub>)- shown in equation (13).</p>      <p><img src="img/revistas/rfiua/n63/n63a06e13.gif"></p>      <p>In that equation D 'is the diagonal matrix with elements:</p>      <p><img src="img/revistas/rfiua/n63/n63a06e14.gif"></p>      ]]></body>
<body><![CDATA[<p>and L1 is the triangular matrix of size 18 x 18 given by</p>      <p><img src="img/revistas/rfiua/n63/n63a06e15.gif"></p>      <p><a href="#Figura2">Figure 2.a</a> shows the block diagram for 36-points IMDCT. The <a href="#Figura2">figure 2.b</a> y <a href="#Figura2">2.c</a> show the modifications done to a 18-point DCT used to calculate a 36-point IMDCT. The reduction in a number of operations is significant.</p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i02.gif" ><a name="Figura2"></a></p>      <p>We can  see from the <a href="#Figura2">figure 2.c</a> that given the vector W1 it is possible to get Y. So,  the SP Block implementation is obtained by equation 16</p>      <p><img src="img/revistas/rfiua/n63/n63a06e16.gif"></p>      <p>Y is the matrix including various components of the vector W1.</p>      <p><img src="img/revistas/rfiua/n63/n63a06e19.gif"></p>      <p>The  DCT-IV can be used to perform the IMDCT mixing both the symmetry and inversion  properties advantages of the DCT-IV and some algorithms proposed &#91;3, 4 &#93; for  executing a standard DCT (SDCT-II). The block diagram that implements the  mentioned transform is shown in <a href="#Figura3">figure 3</a>. </p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i03.gif" ><a name="Figura3"></a></p>      ]]></body>
<body><![CDATA[<p>In general, in order to reduce  execution time of N-points SDCT II, two N/2-points SDCT-II are used. <a href="#Figura4">Figure 4</a> shows a flow diagram  for the IMDCT proposed algorithm.</p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i04.gif" ><a name="Figura4"></a></p>      <p>A 12-point IMDCT was developed following the same procedure, in this case: </p>      <p><img src="img/revistas/rfiua/n63/n63a06e22.gif"></p>      <p>And the vector y(k) is defined by equations (23) and (24).</p>      <p><img src="img/revistas/rfiua/n63/n63a06e23.gif"></p>      <p>n=0,1,2</p>      <p>&nbsp;</p>     <p><font size="3"><b>Hardware implementation for the IMDCT fast algorithm</b></font></p>      <p><i>The S6-PointS IMDCT</i></p>      ]]></body>
<body><![CDATA[<p>The  algorithms studied above reduce the number of floating point operations  therefore are called fast algorithms. Those algorithms were simulated using  MATLAB and compared with another expressions for IMDCT shown in the standard  ISO 11172-3 depicting satisfactory results. In  <a href="#Figura5">figure 5</a> the hardware  implementation block diagram of the 36-point IMDCT is shown. All blocks were  written in VHDL and synthesized using a card VIRTEX 2P. The VHDL code was compatible  with the code used in MATLAB.</p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i05.gif" ><a name="Figura5"></a></p>      <p><i>Addition, subtraction and floating point multiplication block</i></p>      <p>This  block calculates the sum or subtraction of two numbers in single-precision  floating point. It consists of two 32-bit vectors representing two  input operands and a 32-bit output operand. It has a signal to select the  operation to be performed. This block uses 3 sub-blocks, the first one is a 24  bits adder or subtract whose function is to add or subtract the mantissas, the  second one is a block of Pre-standardization used for calculating: the output  exponent, the larger mantissa, the smaller mantissa and the output sign.  Finally a block of Post - standardization which converts the results to the  IEEE-754 format.</p>      <p><i>Floating multiplication block</i></p>      <p>Computes the multiplication of two floating point numbers and do not require clock signal</p>      <p><i>9 Points SDCT Block</i></p>      <p>This block requires both  floating-point multiplication and add-subtraction subsystems. The program's  structure is sequential and calculates twice SDCT of 9 points. The design was  done using a finite state machine. The result of a floating point operation  takes one clock cycle. The final result is obtained in 46 states or clock  cycles. To reduce the number of states addition/ subtraction and multiplication  operations were executed in parallel.</p>         <p><i>18-Point SDCTII Block </i></p>       <p>This block requires two floating-point operation subsystems  and a 9-point SDCT block. The code has two arrays of 18 vectors of 32 bits  representing the input ''X'' and output ''Y''. The block  includes a multiplexer which permits to select a floating point block to  calculate the 9 points SDCT or internal calculations. A single block of add-  subtract and multiplication was used regarding minimize the hardware. The  program's structure is sequential and calculates twice 9-point SDCT s for even  and odd numbers</p>       ]]></body>
<body><![CDATA[<p><i>IMDCT  and DCT-IV Blocks</i> </p>       <p>This block first calculates the 18 results of DCT- IV and  subsequently delivered serially IMDCT calculation results. DCT-IV requires both  the SDCTII block and the floating point subsystems which are selected by a  multiplexer in order to execute internal 18-point SDCTII calculations. The code  includes four arrays of 18 vectors of 32 bits used to store temporary data and  input that can be previously stored in a memory. The output is a 32-bit vector  that delivers one by one the 36 results. </p>       <p><i>12-Point IMDCT</i></p>     <p>Was developed using a similar  procedure for a 36-Points IMDCT. Some modifications were done according to the  mathematical conditions.</p>      <p>&nbsp; </p>     <p><font size="3"><b>Low accuracy IMDCT synthesis and implementation</b></font></p>        <p>Once  the design of 36 and 12 points IMDCT was done, the number of bits in floating  point representation was reduced from 32 to 23 regarding minimize the required  hardware. This type of architecture is called limited accuracy implementation &#91;5, 6 &#93; due to the calculations in MP3. Low-precision floating point does not  represent a significant change in sound quality and can reduce power  consumption and area on the chip. The minimum number of bits used in floating  point representation is 23 bits distributed in 16-bit mantissa, 6-bits exponent  and an sign bit &#91;7 &#93;. The low accuracy (23 bits) IMDCT design generates a low  error which was programmed on the card VIRTEX 2P. From a 36 IMDCT low-precision  code synthesizing, an equivalent radius of 24% using VIRTEX 2P card and the  XC2VP30-6FF896 device was obtained. Experimental results are shown in <a href="#Figura6">figure 6</a></p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06i06.gif" ><a name="Figura6"></a></p>         <p>The  IMDCT12 low precision code synthesizing gave a 9% of the VIRTEX2P card with the  XC2VP30-6FF896 device. Experimental results are shown in <a href="#Figura7">figure 7</a>.</p>          <p align="center"><img src="img/revistas/rfiua/n63/n63a06i07.gif" ><a name="Figura7"></a></p>          ]]></body>
<body><![CDATA[<p>Data from 12 and 36 points low precision IMDCT  implementation using the VIRTEX 2 card were supplied by the CHIPSCOPE 7.1 real  time logical scanner. The results are described below. </p>           <p><i>Low  precision 36 Points  IMDCT</i> </p>         <p>Input: X = &#91;1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 &#93;. </p>         <p>Output data are depicted in <a href="#Figura8">figure 8</a> and are equivalent to  those calculated by the formula of the standard ISO 11172 &#91;8 &#93;. </p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i08.gif" ><a name="Figura8"></a></p>	 	       <p>In  <a href="#Tabla1">table 1</a>, the hexadecimal representation, their conversion to decimal,  theoretical data obtained from Excel tables and the percentage error of the  obtained data using this implementation are shown. Note that the maximum  obtained error is 8.441%, this results is highly satisfactory.</p>          <p align="center"><img src="img/revistas/rfiua/n63/n63a06t01.gif" ><a name="Tabla1"></a></p>          <p><b><i>Low precision 12 Points IMDCT</i></b></p>         <p>The  experimental results are shown in <a href="#Figura9">figure 9</a> and are similar to simulation results obtained for a low precision 12 Points-   IMDCT&nbsp; Input: X = &#91;5 4 23 5 74 96 &#93;.</p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06i09.gif" ><a name="Figura9"></a></p>        ]]></body>
<body><![CDATA[<p>&nbsp;</p>      <p><font size="3"><b>Study of a fast algorithm</b> </font></p>      <p><i>Subband  block implementation:</i> the  subband block synthesis is the final step of the decoder. This module produces  32 PCM samples at the same time using the inputs supplied by the filter bank.  Once the capture of 32 subband samples is done, a matrification is realized in  order to execute an operation of 64 points modified DCT for the block of 32  samples. See <a href="#Figura10">figure 10</a>.</p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06i10.gif" ><a name="Figura10"></a></p>        <p>The 64 points modified DCT can be easily reduced to 32 points DCT, which requires 32 x 32 multiplications. The  verification was done using spreadsheets, allowing to eliminate 50 per cent of redundancy. The  first and last 16 coefficients in the array are identical but with inverted  sign. The same occurs with the following 32 coefficients. </p>          <p><i>A 32 points DCT:</i> was implemented using the fast algorithm proposed  by C. W. Kok &#91;4 &#93;. Where  a N points DCT is divided in two N/2 Point DCT, if N is a power of 2. That operation delivers an even and odd part of N/2 DCTs denoted as C(i)  and D (i) respectively.</p>        <p>In <a href="#Figura11">figure 11</a>, a division  scheme of the 32 DCT considering an even and odd part using a 16 DCT is shown.  Each one of these modules is divided into an even and odd part with 8-Points  DCT respectively. It is important to mention that the odd part must be  multiplied by a scalar factor before executing the respective subdivision in  even and odd part. From <a href="#Figura11">figure 11</a>, it should be mentioned that the 8 DCT is  divided itself into an even and odd part using a 4-Points DCT. </p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06i11.gif" ><a name="Figura11"></a></p>        <p>&nbsp;</p>      <p><font size="3"><b>Hardware design of the 32 points discrete cosin transform fast algorithm</b> </font></p>        ]]></body>
<body><![CDATA[<p>In  <a href="#Figura12">figure 12</a> hardware design for 32 points DCT block diagram is depicted. These  blocks were described in VHDL, simulated and synthesized using a 2P VIRTEX  card.</p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i12.gif" ><a name="Figura12"></a></p>      <p><i>Floating point sum/subtraction block:</i> this block computes the sum  or subtraction of two floating point single-precision 32-bit numbers. This  module consists of two 32-bit vectors representing both operands of input and  output of 32-bit, a clock signal input and a signal which selects the operation  to perform. The block includes 3 sub-blocks, an adder of 24-bit whose function  is to add or subtract the mantissas; a block of pre - standardization which  calculates the output exponent, a larger and less mantissas, the output sign  and the operation to be executed according to the sign of the operators, and  finally, a block of Post-standardization which converts the result to the  IEEE754 format.</p>     <p><i>Floating point multiplication block:</i> this block computes a  single-precision floating-point multiplication of two 32-bits numbers. This  module consists of two 32-bit vectors which represent two input operands, 32  bit output and an input clock signal. It includes two sub-blocks: a 24-bit multiplier  required for multiplying the mantissas, an adder which computes the addition of  the input exponents and resulting sign and finally a standardization block that  determines if the result has overflowed the maximum capacity allowed by IEEE754  floating-point format. </p>       <p><i>32 Point DCT Block:</i> this block computes the 16  DCT for both even and odd parts. Its implementation requires the obtained  results in C(i) and D(i). That results are used to implement a 8 DCT also for  the pair and odd part of the 16 DCT which itself uses a 4 DCT for the  respective even and odd parts. The mentioned transformations require both the  addition/subtraction and multiplication floating point blocks which execute the  operations sequentially following a finite state machine description. A 16 DCT  implementation required 16 constants for the odd part according to the D(i)  function model. Some similar was done for the odd part of the 8 DCT and 4 DCT  schemes. The VHDL description was optimized in order to use a total of 28  32-bit registers and additionally additions or subtractions in parallel with  multiplications. </p>       <p><i>4 Point DCT Block:</i> This block calculates the 2  DCT for even and odd parts. It involves the addition/subtraction and  multiplication floating point blocks. This sub function uses 4 constants for  cosine functions and a total of 4 32-bit registers. </p>       <p>Finally, the 32 Points DCT synthesis gave a ratio of 34 per  cent of the 2PVIRTEX card using the XC2VP30-6FF896 device. Experimental results  are shown in <a href="#Figura13">figure 13</a>:</p>         <p align="center"><img src="img/revistas/rfiua/n63/n63a06i13.gif" ><a name="Figura13"></a></p>        <p>The obtained results were validated using the vector shown in <a href="#Tabla2">table 2</a>.</p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06t02.gif" ><a name="Tabla2"></a></p>      ]]></body>
<body><![CDATA[<p>Simulation results were compared with excel results for 32 DCT. The validation is shown in <a href="#Tabla3">table 3</a>.</p>        <p align="center"><img src="img/revistas/rfiua/n63/n63a06t03.gif" ><a name="Tabla3"></a></p>        <p>The results show a good  approximation since the maximum error obtained for the set of input values was  0.00688%. <a href="#Figura14">Figure 14</a> shows the simulation data obtained for the input values  listed above using the software tool ISE 8.1.</p>      <p align="center"><img src="img/revistas/rfiua/n63/n63a06i14.gif" ><a name="Figura14"></a></p>      <p>Input: X = &#91;1 4 3 4 5 6 7 8 9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 &#93;.</p>     <p>From the 32 Point DCT block  implementation, a finite states machine was described in order to synthesis of  Subband block which starts with the capture of a set of 32 samples that are  then transformed using a 32 point DCT. The obtained results allow to get a 64 points  DCT. These 64 samples are stored in a RAM dual port internal (512x32bits + 4  Parity bits). Two RAM blocks were used for executing the 1024 samples vector  shifting. Furthermore, 512 selected samples were windowed and stored in a  dual-port RAM to obtain 32 output samples by adding each one of the components  of the 32 respective samples.</p>      <p>&nbsp;</p>     <p><font size="3"><b>Conclusions</b> </font></p>     <p>The implementation of the fast IMDCT algorithm applying  different modifications allowed to obtain a system hardware with better  performance than conventional processing methods. The IMDCT block involves 18  inputs and 36 outputs and is useful as MP3 decoding tool allowing that future  projects based on core-soft designs uses embedded processors with less  configurations.</p>     <p>The synthesis of Subband block  in hardware allows a more efficient performance using fast algorithms to  improve processing time. The reduction floating-point representation from 32-  bits to 23-bit got a considerable minimization in hardware without sacrificing  sound quality. The experimental results obtained from the implementation using  VIRTEX2P card show low errors when are compared with the data obtained from MP3  standard theories &#91;8 &#93;.</p>      ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font size="3"><b>References</b> </font></p>        <!-- ref --><p>1. Z. Lai, Z. Liu, M. Li, Q. Yuan. <i>MP3 Player, CSEE  4840 SPRING 2010PROJECT DESIGN</i>. <a href="http://www.cs.columbia.edu/~sedwards/classes/2010/4840/designs/KH.pdf" target="_blank">www.cs.columbia.edu/~sedwards/classes/2010/4840/designs/KH.pdf</a>. Consultado el 10 de marzo de  2010.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000145&pid=S0120-6230201200020000600001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>2. S. Lee. ''Improved algorithm for  efficient computation of the forward and backward MDCT in MPEG audio  coder''.  <i>Circuits and Systems II: Analog and Digital Signal Processing, IEEE  Transactions on</i>.  Vol. 48. 2001. pp. 990-994.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000147&pid=S0120-6230201200020000600002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>3. M. Cheng, Y. Hsu.  ''Fast IMDCT and MDCT algorithms a matrix Approach''. <i>Signal  Processing, IEEE Transactions on</i>. Vol. 51. 2003. pp. 221-229.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000149&pid=S0120-6230201200020000600003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>4. C. Kok. ''Fast algorithms  for computing Discrete Cosine Transform''. <i>Signal  Processing, IEEE Transactions on</i>. Vol. 45. 1997. pp. 757-760.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000151&pid=S0120-6230201200020000600004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       ]]></body>
<body><![CDATA[<!-- ref --><p>5. B. Lee. ''A new  algorithm to compute the discrete cosine Transform IEEE Transactions on  Acoustics''. <i>Speech, and Signal Processing</i>. Vol. 32. 1984. pp. 1243-1245.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000153&pid=S0120-6230201200020000600005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>6. <i>Codification  MP3</i>. Disponible en: <a href="http://members.fortunecity.com/alex1944/mp3coding/maindata.html" target="_blank">http://members.fortunecity.com/alex1944/mp3coding/maindata.html</a>. Consultado  el 10 de septiembre de 2009.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000155&pid=S0120-6230201200020000600006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>7. J. Eilert,  A. Ehliar, D. Liu. <i>Using Low Precision Floating Point Numbers to Reduce Memory  Cost for MP3 Decoding</i>. Dept. of Electrical Engineering, Linkoping University.  Linkoping (Suecia). 2004. pp.119-122.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000157&pid=S0120-6230201200020000600007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>       <!-- ref --><p>8. ISO/IEC. <i>Information Technology &#8211;  Coding of Moving Pictures and Associated Audio for Digital Storage Media at up  to About 1.5Mbit/s, Part 3: Audio</i>. Ginebra (Suiza). 2004. pp.1-147.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000159&pid=S0120-6230201200020000600008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> </p>     <!-- ref --><p>9. K. Konstantinides.  ''Fast subband filtering in MPEG audio coding''. <i>IEEE Signal Processing  Letters</i>. Vol.  1. 1994. pp. 26-28.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000161&pid=S0120-6230201200020000600009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --></p>  </font>    ]]></body>
<body><![CDATA[ ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lai]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Yuan]]></surname>
<given-names><![CDATA[Q.]]></given-names>
</name>
</person-group>
<source><![CDATA[MP3 Player, CSEE 4840 SPRING 2010PROJECT DESIGN]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Improved algorithm for efficient computation of the forward and backward MDCT in MPEG audio coder]]></article-title>
<source><![CDATA[Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on]]></source>
<year>2001</year>
<volume>48</volume>
<page-range>990-994</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cheng]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Hsu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Fast IMDCT and MDCT algorithms a matrix Approach]]></article-title>
<source><![CDATA[Signal Processing, IEEE Transactions on]]></source>
<year>2003</year>
<volume>51</volume>
<page-range>221-229</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kok]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Fast algorithms for computing Discrete Cosine Transform]]></article-title>
<source><![CDATA[Signal Processing, IEEE Transactions on]]></source>
<year>1997</year>
<volume>45</volume>
<page-range>757-760</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lee]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A new algorithm to compute the discrete cosine Transform IEEE Transactions on Acoustics]]></article-title>
<source><![CDATA[Speech, and Signal Processing]]></source>
<year>1984</year>
<volume>32</volume>
<page-range>1243-1245</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="">
<source><![CDATA[Codification MP3]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Eilert]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Ehliar]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
</person-group>
<source><![CDATA[Using Low Precision Floating Point Numbers to Reduce Memory Cost for MP3 Decoding]]></source>
<year>2004</year>
<page-range>119-122</page-range><publisher-loc><![CDATA[Linkoping ]]></publisher-loc>
<publisher-name><![CDATA[Dept. of Electrical Engineering, Linkoping University]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="">
<source><![CDATA[ISO/IEC. Information Technology - Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5Mbit/s, Part 3: Audio]]></source>
<year>2004</year>
<page-range>1-147</page-range><publisher-loc><![CDATA[Ginebra ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Konstantinides]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Fast subband filtering in MPEG audio coding]]></article-title>
<source><![CDATA[IEEE Signal Processing Letters]]></source>
<year>1994</year>
<volume>1</volume>
<page-range>26-28</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
