<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0124-8170</journal-id>
<journal-title><![CDATA[Ciencia e Ingeniería Neogranadina]]></journal-title>
<abbrev-journal-title><![CDATA[Cienc. Ing. Neogranad.]]></abbrev-journal-title>
<issn>0124-8170</issn>
<publisher>
<publisher-name><![CDATA[Universidad Militar Nueva Granada]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0124-81702020000100107</article-id>
<article-id pub-id-type="doi">10.18359/rcin.4194</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[A Hardware Accelerator for The Inference of a Convolutional Neural Network]]></article-title>
<article-title xml:lang="es"><![CDATA[Acelerador en hardware para la inferencia de una red neuronal convolucional]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[González]]></surname>
<given-names><![CDATA[Edwin]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Villamizar Luna]]></surname>
<given-names><![CDATA[Walter D.]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Fajardo Ariza]]></surname>
<given-names><![CDATA[Carlos Augusto]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Universidad Industrial de Santander  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>06</month>
<year>2020</year>
</pub-date>
<volume>30</volume>
<numero>1</numero>
<fpage>107</fpage>
<lpage>116</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0124-81702020000100107&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0124-81702020000100107&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0124-81702020000100107&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract: Convolutional Neural Networks (CNNs) are becoming increasingly popular in deep learning applications, e.g. image classification, speech recognition, medicine, to name a few. However, CNN inference is computationally intensive and demands a large number of memory resources. This work proposes a CNN inference hardware accelerator, which was implemented in a co-processing scheme. The aim is to reduce hardware resources and achieve the best possible throughput. The design is implemented in the Digilent Arty Z7-20 development board, which is based on the Xilinx Zynq-7000 System on Chip (SoC). Our implementation achieved a of accuracy for the MNIST database using only a 12-bits fixed-point format. Results show that the co-processing scheme operating at a conservative speed of 100 MHz can identify around 441 images per second, which is about 17% times faster than a 650 MHz - software implementation. It is difficult to compare our results against other Field-Programmable Gate Array (FPGA)-based implementations because they are not exactly like ours. However, some comparisons, regarding logical resources used and accuracy, suggest that our work could be better than previous ones. Besides, the proposed scheme is compared with a hardware implementation in terms of power consumption and throughput.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[Resumen: Las redes neuronales convolucionales cada vez son más populares en aplicaciones de aprendizaje profundo, como por ejemplo en clasificación de imágenes, reconocimiento de voz, medicina, entre otras. Sin embargo, estas redes son computacionalmente costosas y requieren altos recursos de memoria. En este trabajo se propone un acelerador en hardware para el proceso de inferencia de la red Lenet-5, un esquema de coprocesamiento hardware/software. El objetivo de la implementación es reducir el uso de recursos de hardware y obtener el mejor rendimiento computacional posible durante el proceso de inferencia. El diseño fue implementado en la tarjeta de desarrollo Digilent Arty Z7-20, la cual está basada en el System on Chip (SoC) Zynq-7000 de Xilinx. Nuestra implementación logró una precisión del 97,59 % para la base de datos MNIST utilizando tan solo 12 bits en el formato de punto fijo. Los resultados muestran que el esquema de co-procesamiento, el cual opera a una velocidad de 100 MHz, puede identificar aproximadamente 441 imágenes por segundo, que equivale aproximadamente a un 17% más rápido que una implementación de software a 650 MHz. Es difícil comparar nuestra implementación con otras implementaciones similares, porque las implementaciones encontradas en la literatura no son exactamente como la que realizó en este trabajo. Sin embargo, algunas comparaciones, en relación con el uso de recursos lógicos y la precisión, sugieren que nuestro trabajo supera a trabajos previos.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[CNN]]></kwd>
<kwd lng="en"><![CDATA[FPGA]]></kwd>
<kwd lng="en"><![CDATA[hardware accelerator]]></kwd>
<kwd lng="en"><![CDATA[MNIST]]></kwd>
<kwd lng="en"><![CDATA[Zynq]]></kwd>
<kwd lng="es"><![CDATA[CNN]]></kwd>
<kwd lng="es"><![CDATA[FPGA]]></kwd>
<kwd lng="es"><![CDATA[acelerador en hardware]]></kwd>
<kwd lng="es"><![CDATA[MNIST, Zynq]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>[1]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[LeCun]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Gradient-based learning applied to document recognition]]></source>
<year>1998</year>
<volume>86</volume>
<numero>11</numero>
<conf-name><![CDATA[ Proceedings of the IEEE]]></conf-name>
<conf-loc> </conf-loc>
<issue>11</issue>
<page-range>2278-324</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>[2]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Szegedy]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Going deeper with convolutions]]></source>
<year>2015</year>
<conf-name><![CDATA[ IEEE Conference on Computer Vision and Pattern Recognition (CVPR)]]></conf-name>
<conf-date>2015</conf-date>
<conf-loc> </conf-loc>
<page-range>1-9</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>[3]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dundar]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Embedded streaming deep neural networks accelerator with applications]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Jin]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Martini]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Culurciello]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<source><![CDATA[IEEE Transactions on Neural Networks and Learning Systems]]></source>
<year>2017</year>
<volume>28</volume>
<numero>7</numero>
<issue>7</issue>
<page-range>1572-83</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>[4]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ahn]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
</person-group>
<source><![CDATA[Real-time video object recognition using convolutional neural network]]></source>
<year>2015</year>
<conf-name><![CDATA[ International Joint Conference on Neural Networks (IJCNN)]]></conf-name>
<conf-date>2015</conf-date>
<conf-loc> </conf-loc>
<page-range>1-7</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>[5]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yu]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Tsao]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Chien]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Architecture design of convolutional neural networks for face detection on an fpga platform]]></source>
<year>2018</year>
<conf-name><![CDATA[ IEEE International Workshop on Signal Processing Systems (SiPS)]]></conf-name>
<conf-date>2018</conf-date>
<conf-loc> </conf-loc>
<page-range>88-93</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>[6]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Xiong]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Stiles]]></surname>
<given-names><![CDATA[M. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhao]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Robust ecg signal classification for detection of atrial fibrillation using a novel neural network]]></source>
<year>2017</year>
<conf-name><![CDATA[ Computing in Cardiology (CinC)]]></conf-name>
<conf-date>2017</conf-date>
<conf-loc> </conf-loc>
<page-range>1-4</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>[7]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Guo]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Angel-eye: A complete design flow for mapping cnn onto embedded fpga]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Sui]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Qiu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Yu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Yao]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Yang]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
</person-group>
<source><![CDATA[IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems]]></source>
<year>2018</year>
<volume>37</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>35-47</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>[8]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Suda]]></surname>
<given-names><![CDATA[N.]]></given-names>
</name>
</person-group>
<source><![CDATA[Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks]]></source>
<year>2016</year>
<conf-name><![CDATA[ Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '16]]></conf-name>
<conf-loc> </conf-loc>
<page-range>16-25</page-range><publisher-loc><![CDATA[New York, NY, USA ]]></publisher-loc>
<publisher-name><![CDATA[ACM]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>[9]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
<name>
<surname><![CDATA[Li]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Guan]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Xiao]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Cong]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Optimizing fpga-based accelerator design for deep convolutional neural networks]]></source>
<year>2015</year>
<conf-name><![CDATA[ Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA '15]]></conf-name>
<conf-loc> </conf-loc>
<page-range>161-70</page-range><publisher-loc><![CDATA[New York, NY, USA ]]></publisher-loc>
<publisher-name><![CDATA[ACM]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>[10]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ovtcharov]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Accelerating deep convolutional neural networks using specialized hardware]]></article-title>
<person-group person-group-type="editor">
<name>
<surname><![CDATA[Ruwase]]></surname>
<given-names><![CDATA[O.]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[J.-Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Fowers]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
<name>
<surname><![CDATA[Strauss]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Chung]]></surname>
<given-names><![CDATA[E. S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Microsoft Research Whitepaper]]></source>
<year>2015</year>
<volume>2</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>1-4</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>[11]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tsai]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Ho]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Sheu]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Implementation of fp-ga-based accelerator for deep neural networks]]></source>
<year>2019</year>
<conf-name><![CDATA[ IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits Systems (DDECS)]]></conf-name>
<conf-date>2019</conf-date>
<conf-loc> </conf-loc>
<page-range>1-4</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>[12]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shen]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Ferdman]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Milder]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[Maximizing cnn accelerator efficiency through resource partitioning]]></source>
<year>2017</year>
<conf-name><![CDATA[ ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)]]></conf-name>
<conf-date>2017</conf-date>
<conf-loc> </conf-loc>
<page-range>535-47</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>[13]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wang]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Low power convolutional neural networks on a chip]]></source>
<year>2016</year>
<conf-name><![CDATA[ IEEE International Symposium on Circuits and Systems (ISCAS)]]></conf-name>
<conf-date>2016</conf-date>
<conf-loc> </conf-loc>
<page-range>129-32</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>[14]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Feng]]></surname>
<given-names><![CDATA[G.]]></given-names>
</name>
<name>
<surname><![CDATA[Hu]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wu]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<source><![CDATA[Energy-efficient and high-throughput fpga-based accelerator for convolutional neural networks]]></source>
<year>2016</year>
<conf-name><![CDATA[ 13IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)]]></conf-name>
<conf-date>2016</conf-date>
<conf-loc> </conf-loc>
<page-range>624-6</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>[15]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ghaffari]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Sharifian]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[FPGA-based convolutional neural network accelerator design using high level synthesize]]></source>
<year>2016</year>
<conf-name><![CDATA[ Proceedings - 2016 2nd International Conference of Signal Processing and Intelligent Systems, ICSPIS]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1-6</page-range></nlm-citation>
</ref>
<ref id="B16">
<label>[16]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhou]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Jiang]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[An FPGA-based accelerator implementation for deep convolutional neural networks]]></source>
<year>2015</year>
<volume>01</volume>
<conf-name><![CDATA[ Proceedings of 2015 4th International Conference on Computer Science and Network Technology]]></conf-name>
<conf-date>2015</conf-date>
<conf-loc> </conf-loc>
<page-range>829-32</page-range><publisher-name><![CDATA[Iccsnt]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B17">
<label>[17]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nair]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Hinton]]></surname>
<given-names><![CDATA[G. E.]]></given-names>
</name>
</person-group>
<source><![CDATA[Rectified linear units improve restricted Boltzman machines]]></source>
<year>2010</year>
<conf-name><![CDATA[ Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML'10]]></conf-name>
<conf-loc> </conf-loc>
<page-range>807-14</page-range><publisher-loc><![CDATA[USA ]]></publisher-loc>
<publisher-name><![CDATA[Omnipress]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<label>[18]</label><nlm-citation citation-type="">
<source><![CDATA[Xilinx. Axi reference guide]]></source>
<year></year>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
