<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0121-750X</journal-id>
<journal-title><![CDATA[Ingeniería]]></journal-title>
<abbrev-journal-title><![CDATA[ing.]]></abbrev-journal-title>
<issn>0121-750X</issn>
<publisher>
<publisher-name><![CDATA[Universidad Distrital Francisco José de Caldas]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0121-750X2021000100062</article-id>
<article-id pub-id-type="doi">10.14483/23448393.15634</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Análisis de desempeño de capas de CNN para arquitecturas heterogéneas basadas en FPGAs usando HLS]]></article-title>
<article-title xml:lang="en"><![CDATA[Performance Analysis of CNN Layers for Heterogeneous FPGAs-based Architectures Using HLS]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Guerra-Londono]]></surname>
<given-names><![CDATA[Mateo]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Castano-Londono]]></surname>
<given-names><![CDATA[Luis]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Alzate-Anzola]]></surname>
<given-names><![CDATA[Cristian]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Marquez-Viloria]]></surname>
<given-names><![CDATA[David]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Velasquez-Velez]]></surname>
<given-names><![CDATA[Ricardo]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Instituto Tecnológico Metropolitano Facultad de Ingenierías ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Instituto Tecnológico Metropolitano Facultad de Ingenierías ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Instituto Tecnológico Metropolitano Facultad de Ingenierías ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af4">
<institution><![CDATA[,Instituto Tecnológico Metropolitano Facultad de Ingenierías ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af5">
<institution><![CDATA[,Universidad de Antioquia UdeA Facultad de Ingeniería ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>04</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>04</month>
<year>2021</year>
</pub-date>
<volume>26</volume>
<numero>1</numero>
<fpage>62</fpage>
<lpage>76</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0121-750X2021000100062&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0121-750X2021000100062&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0121-750X2021000100062&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[Resumen  Contexto:  Las redes neuronales convolucionales (CNNs) son actualmente utilizadas en una amplia gama de aplicaciones de inteligencia artificial. En muchos casos, dichas aplicaciones requieren la ejecución de las redes en tiempo real en dispositivos integrados. Por esto, el interés en que estas aplicaciones puedan alcanzar un buen desempeño con bajo consumo de potencia. Las CNNs realizan operaciones entre los datos de entrada y los pesos de la red, con la particularidad de que no existe dependencia entre la mayoría de las operaciones. Por tal motivo, el paralelismo inherente de los FPGAs puede ser usado para realizar múltiples operaciones en paralelo, manteniendo el buen desempeño por vatio que caracteriza a estos dispositivos. Este artículo se enfoca en la evaluación del algoritmo de convolución para una capa convolucional de redes neuronales explorando directivas de paralelización usando VIVADO HLS, y su objetivo es evaluar el desempeño del algoritmo utilizando directivas de optimización.  Método:  La metodología consiste en una exploración del espacio de diseño de la implementación de una capa de una red neuronal convolucional usando VIVADO HLS. La verificación del funcionamiento del FPGA fue realizada comparando los datos de salida con el mismo algoritmo de convolución implementado en MATLAB. Una capa de la versión comercial Xilinx DNNK fue usada como referencia para las medidas de desempeño de las diferentes implementaciones obtenidas en la exploración del espacio de diseño. En este trabajo se utilizan múltiples variaciones de directivas de optimización, tales como pipeline, array partition, y unroll.  Resultados:  Este trabajo presenta los resultados de una implementación de referencia (sin directivas de optimización) del algoritmo de convolución con respecto a la latencia del algoritmo y los recursos de hardware utilizados por la FPGA. Los resultados se comparan con implementaciones del algoritmo, incluyendo diferentes combinaciones de dos directivas de optimización (pipeline y partition array).  Conclusiones:  Este trabajo explora el espacio de diseño de un algoritmo de convolución para una capa de red neuronal convolucional sobre FPGAs. La exploración incluye el efecto causado por la transferencia de los datos entre la memoria DDR y la memoria on-chip del FPGA. Además, dicho efecto es causado por las directivas de optimización en Vivado HLS sobre los diferentes ciclos del algoritmo.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract  Context:  Convolutional neural networks (CNNs) are currently used in a wide range of artificial intelligence applications. In many cases, these applications require the execution of the networks in real time on embedded devices. Hence the interest in these applications achieving excellent performance with low power consumption. CNNs perform operations between the input data and the network weights, with the particularity that there is no dependence between most of the operations. Thus, the inherent parallelism of Field Programmable Gate Arrays (FPGAs) can be used to perform multiple operations in parallel, maintaining the good performance per watt that characterizes these devices. This paper focuses on evaluating the convolution algorithm for a convolutional layer of neural networks by exploring parallelization directives using VIVADO HLS, and it aims to evaluate the performance of the algorithm using optimization directives.  Method:  The methodology consists of an exploration of the design space of a convolutional neural network layer implementation using VIVADO HLS. Performance verification of the FPGA was performed by comparing the output data with the same convolution algorithm implemented in MATLAB. A layer of the commercial version Xilinx DNNK was used as a reference for performance measurements of the different implementations obtained during the exploration of the design space. In this work, multiple variations of optimization directives are used, such as pipeline, array partition and unroll.  Results:  This paper presents the results of a reference implementation (without optimization directives) of the convolution algorithm concerning algorithm latency and the hardware resources used by the FPGA. The results are compared with the implementations of the algorithm, including different combinations of two optimization directives (pipeline and partition array).  Conclusions:  This work explores the design space of a convolution algorithm for a convolutional neural network layer on FPGAs. The exploration includes the effect of data transfer between DDR memory and the on-chip memory of the FPGA. Also, said effect is caused by the optimization directives in VIVADO HLS on the different cycles of the algorithm.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Convolución]]></kwd>
<kwd lng="es"><![CDATA[directivas de optimización]]></kwd>
<kwd lng="es"><![CDATA[FPGA]]></kwd>
<kwd lng="es"><![CDATA[red neuronal convolucional]]></kwd>
<kwd lng="es"><![CDATA[síntesis de alto nivel]]></kwd>
<kwd lng="en"><![CDATA[Convolution]]></kwd>
<kwd lng="en"><![CDATA[convolutional neural network]]></kwd>
<kwd lng="en"><![CDATA[FPGA]]></kwd>
<kwd lng="en"><![CDATA[high-level synthesis]]></kwd>
<kwd lng="en"><![CDATA[optimization directives.]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>[1]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aledo]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Carrion]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Moreno]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;VHDL vs. SysteMC: Design of highly parameterizable artificial neural networks&#8221;]]></article-title>
<source><![CDATA[IEICE Transactions on Information and Systems]]></source>
<year>2019</year>
<numero>3</numero>
<issue>3</issue>
<page-range>512-21</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>[2]</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aydonat]]></surname>
<given-names><![CDATA[U.]]></given-names>
</name>
<name>
<surname><![CDATA[O&#8217;Connell]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Capalija]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Ling]]></surname>
<given-names><![CDATA[A. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Chiu]]></surname>
<given-names><![CDATA[G. R.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;An OpenCLTM deep learning accelerator on Arria 10&#8221;]]></article-title>
<source><![CDATA[FPGA 2017 - Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays]]></source>
<year>2017</year>
<page-range>55-64</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>[3]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bai]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;A CNN Accelerator on FPGA Using Depthwise Separable Convolution&#8221;]]></article-title>
<source><![CDATA[IEEE Transactions on Circuits and Systems II: Express Briefs]]></source>
<year>2018</year>
<volume>65</volume>
<numero>10</numero>
<issue>10</issue>
<page-range>1415-9</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>[4]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chakradhar]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sankaradas]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Jakkula]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Cadambi]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[A dynamically configurable coprocessor for convolutional neural networks]]></source>
<year>2010</year>
<conf-name><![CDATA[ Proceedings - International Symposium on Computer Architecture]]></conf-name>
<conf-loc> </conf-loc>
<page-range>247-57</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>[5]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ding]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Tian]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Feng]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Designing efficient accelerator of depthwise separable convolutional neural network on fpga&#8221;]]></article-title>
<source><![CDATA[Journal of Systems Architecture]]></source>
<year>2019</year>
<volume>97</volume>
<page-range>278-86</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>[6]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hamdan]]></surname>
<given-names><![CDATA[M. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Rover]]></surname>
<given-names><![CDATA[D. T.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;VHDL generator for a high performance convolutional neural network FPGAbased accelerator&#8221;]]></source>
<year>2017</year>
<conf-name><![CDATA[ 2017 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2017]]></conf-name>
<conf-loc> </conf-loc>
<publisher-loc><![CDATA[Cancun ]]></publisher-loc>
</nlm-citation>
</ref>
<ref id="B7">
<label>[7]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[J. H.]]></given-names>
</name>
<name>
<surname><![CDATA[Grady]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Lian]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Brothers]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Anderson]]></surname>
<given-names><![CDATA[J. H.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;FPGA-based CNN inference accelerator synthesized from multi-threaded C software&#8221;]]></source>
<year>2017</year>
<conf-name><![CDATA[ International System on Chip Conference, Munich]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B8">
<label>[8]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LeCun]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Gradient-based learning applied to document recognition&#8221;]]></article-title>
<source><![CDATA[Proceedings of the IEEE]]></source>
<year>1998</year>
<volume>86</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>2278-324</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>[9]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Fan]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Jiao]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Cao]]></surname>
<given-names><![CDATA[W.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhou]]></surname>
<given-names><![CDATA[X.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<source><![CDATA[A high performance FPGA-based accelerator for large-scale convolutional neural networks&#8221;, FPL 2016 - 26th International Conference on Field-Programmable Logic and Applications]]></source>
<year>2016</year>
<publisher-name><![CDATA[Lausanne]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>[10]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nurvitadhi]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Can FPGAs beat GPUs in accelerating next-generation deep neural networks&#8221;]]></source>
<year>2017</year>
<conf-name><![CDATA[ FPGA 2017 - Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B11">
<label>[11]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Qiao]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Ma]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Fpga implementation of face recognition system based on convolution neural network&#8221;]]></source>
<year>2018</year>
<conf-name><![CDATA[ 2018 Chinese Automation Congress (CAC), Xi&#8217;an]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B12">
<label>[12]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Qiu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Going deeper with embedded FPGA platform for convolutional neural network&#8221;]]></source>
<year>2016</year>
<conf-name><![CDATA[ FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B13">
<label>[13]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Suda]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks]]></source>
<year>2016</year>
<conf-name><![CDATA[ FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays]]></conf-name>
<conf-loc> </conf-loc>
</nlm-citation>
</ref>
<ref id="B14">
<label>[14]</label><nlm-citation citation-type="book">
<collab>Xilinx and Inc</collab>
<source><![CDATA[&#8220;DNNDK User Guide&#8221;, reporte técnico]]></source>
<year>2019</year>
<publisher-name><![CDATA[Xilinx and Inc.]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<label>[15]</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Optimizing FPGA-based Accelerator Design for Deep.pdf&#8221;, en ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)]]></source>
<year>2015</year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
