<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0121-1129</journal-id>
<journal-title><![CDATA[Revista Facultad de Ingeniería]]></journal-title>
<abbrev-journal-title><![CDATA[Rev. Fac. ing.]]></abbrev-journal-title>
<issn>0121-1129</issn>
<publisher>
<publisher-name><![CDATA[Universidad Pedagógica y Tecnológica de Colombia]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0121-11292024000300011</article-id>
<article-id pub-id-type="doi">10.19503/01211129.v33.n69.2024.18076</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[SMART PRODUCT BACKLOG: CLASIFICACIÓN AUTOMÁTICA DE HISTORIAS DE USUARIO USANDO MODELOS DE LENGUAJE DE GRAN ESCALA]]></article-title>
<article-title xml:lang="en"><![CDATA[Smart Product Backlog: Automatic Classification of User Stories Using Large Language Models (LLM)]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gaona-Cuevas]]></surname>
<given-names><![CDATA[Mauricio]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Bucheli-Guerrero]]></surname>
<given-names><![CDATA[Víctor]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Vera-Rivera]]></surname>
<given-names><![CDATA[Fredy]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Universidad del Valle  ]]></institution>
<addr-line><![CDATA[Cali Valle del Cauca]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Universidad del Valle  ]]></institution>
<addr-line><![CDATA[Cali Valle del Cauca]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Universidad Francisco de Paula Santander  ]]></institution>
<addr-line><![CDATA[Cúcuta Norte de Santander]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2024</year>
</pub-date>
<volume>33</volume>
<numero>69</numero>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0121-11292024000300011&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0121-11292024000300011&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0121-11292024000300011&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[RESUMEN En los procesos de desarrollo ágil de software, específicamente de las aplicaciones inteligentes que aprovechan la inteligencia artificial (IA), el Smart Product Backlog (SPB) es un artefacto que incluye funcionalidades implementables tanto con IA como sin esta. En este contexto, existe un trabajo notable en el desarrollo de modelos de Procesamiento del Lenguaje Natural (NLP) en los que, aquellos de gran escala (LLM por sus siglas en inglés), han demostrado un rendimiento excepcional. Sin embargo, surgió la pregunta respecto a si dichos modelos podían utilizarse en tareas de clasificación automática, sin necesidad de una anotación previa, permitiendo la extracción directa del Smart Product Backlog (SPB). En este estudio, se comparó la eficacia de las técnicas de ajuste con los métodos de prompting para esclarecer el potencial de los modelos ChatGPT-4o, Gemini Pro1.5 y ChaGPT-Mini; se construyó un set de datos con historias de usuario, clasificadas manualmente por un grupo de expertos, que permitió realizar el ensamble de experimentos y, a su vez, construir las tablas de contingencia, respectivas; y se evaluaron estadísticamente las métricas de desempeño de la clasificación de cada LLM y se utilizaron métricas de rendimiento, como la exactitud, la sensibilidad y el F1-Score, para determinar la efectividad de cada modelo. Este enfoque comparativo buscó destacar las fortalezas y limitaciones de cada LLM en el contexto de estructurar la asistencia en la construcción del SPB de manera eficiente y precisa. El análisis demostró que ChatGPT-Mini tiene limitaciones en el balance entre precisión y sensibilidad. Además, aunque Gemini Pro1.5 mostró superioridad en la puntuación de exactitud, y ChatGPT también exhibió un rendimiento aceptable, ninguno es lo suficientemente robusto para construir una herramienta completamente automatizada para la clasificación de historias de usuario. Por lo tanto, se identifica la necesidad de desarrollar un clasificador especializado que permita la construcción de una herramienta automatizada para recomendar historias de usuario viables para el desarrollo con IA, apoyando así la toma de decisiones en proyectos de software ágiles.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[ABSTRACT In agile software development processes, specifically within intelligent applications that leverage artificial intelligence (AI), Smart Product Backlog (SPB) serves as an artifact that includes both AI-implementable functionalities and those that do not use AI. Significant work has been done in the development of Natural Language Processing (NLP) models, and Large Language Models (LLMs) have demonstrated exceptional performance. However, whether LLMs can be used in automatic classification tasks without prior annotation, thereby allowing direct extraction from the Smart Product Backlog (SPB) remains an unanswered question. In this study, we compared the effectiveness of fine-tuning techniques with "prompting" methods to determine the potential of models such as ChatGPT-4o, Gemini Pro 1.5, and ChaGPT-Mini. A dataset was constructed with user stories manually classified by a group of experts, which enabled assembling experiments and creating the respective contingency tables. The classification performance metrics of each LLM were statistically evaluated; accuracy, sensitivity, and F1-Score were used to assess the effectiveness of each model. This comparative approach aimed to highlight the strengths and limitations of each LLM in efficiently and accurately assisting in the construction of the SPB. This comparative analysis demonstrates that ChatGPT-Mini has limitations in balancing precision and sensitivity. Although Gemini Pro 1.5 was superior in accuracy scores and ChatGPT performed well, neither is robust enough to build a fully automated tool for user story classification. Therefore, we identified the need to develop a specialized classifier that enables the construction of an automated tool to recommend viable user stories for AI development, thereby supporting decision-making in agile software projects.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[backlog de producto inteligente]]></kwd>
<kwd lng="es"><![CDATA[clasificación de historias de usuario]]></kwd>
<kwd lng="es"><![CDATA[especificación de requerimientos software]]></kwd>
<kwd lng="es"><![CDATA[identificador inteligente de historias de usuario]]></kwd>
<kwd lng="es"><![CDATA[inteligencia artificial]]></kwd>
<kwd lng="es"><![CDATA[modelos de lenguaje a gran escala]]></kwd>
<kwd lng="en"><![CDATA[artificial intelligence]]></kwd>
<kwd lng="en"><![CDATA[large scale language models]]></kwd>
<kwd lng="en"><![CDATA[smart product backlog]]></kwd>
<kwd lng="en"><![CDATA[smart user story identifier]]></kwd>
<kwd lng="en"><![CDATA[Software Requirements Specification]]></kwd>
<kwd lng="en"><![CDATA[user story classification]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>[1]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Beck]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Fowler]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
</person-group>
<source><![CDATA[Planning Extreme Programming]]></source>
<year>2001</year>
<publisher-name><![CDATA[Addison Wesley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>[2]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sedano]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Ralph]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Peraire]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<source><![CDATA[The Product Backlog]]></source>
<year>2019</year>
<conf-name><![CDATA[ International Conference on Software Engineering]]></conf-name>
<conf-loc>Montreal, Canada </conf-loc>
<page-range>200-11</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>[3]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dos Santos]]></surname>
<given-names><![CDATA[C. A.]]></given-names>
</name>
<name>
<surname><![CDATA[Bouchard]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Petrillo]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<source><![CDATA[AI-Driven User Story Generation]]></source>
<year>2024</year>
<conf-name><![CDATA[ International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications (ACDSA)]]></conf-name>
<conf-loc>Victoria, Seychelles </conf-loc>
</nlm-citation>
</ref>
<ref id="B4">
<label>[4]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kaur]]></surname>
<given-names><![CDATA[K.]]></given-names>
</name>
<name>
<surname><![CDATA[Kaur]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[The application of AI techniques in requirements classification: a systematic mapping]]></article-title>
<source><![CDATA[Artificial Intelligence Review]]></source>
<year>2024</year>
<volume>57</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>1-48</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>[5]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Arulmohan]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Meurs]]></surname>
<given-names><![CDATA[M. J.]]></given-names>
</name>
<name>
<surname><![CDATA[Mosser]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<source><![CDATA[Extracting Domain Models from Textual Requirements in the Era of Large Language Models]]></source>
<year>2023</year>
<conf-name><![CDATA[ International Conference on Model Driven Engineering Languages and Systems Companion (MODELS-C)]]></conf-name>
<conf-loc>Suecia </conf-loc>
<page-range>580-7</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>[6]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zhang]]></surname>
<given-names><![CDATA[Z.]]></given-names>
</name>
<name>
<surname><![CDATA[Rayhan]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Herda]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Goisauf]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Abrahamsson]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<source><![CDATA[LLM-Based Agents for Automating the Enhancement of User Story Quality: An Early Report]]></source>
<year>2024</year>
<conf-name><![CDATA[ Agile Processes in Software Engineering and Extreme Programming]]></conf-name>
<conf-loc>Germany </conf-loc>
<page-range>117-26</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>[7]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Rahman]]></surname>
<given-names><![CDATA[T.]]></given-names>
</name>
<name>
<surname><![CDATA[Zhu]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
</person-group>
<source><![CDATA[Automated User Story Generation with Test Case Specification Using Large Language Model]]></source>
<year>2024</year>
<publisher-name><![CDATA[Arxiv-Software Engineering]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B8">
<label>[8]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chuor]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
<name>
<surname><![CDATA[Ittoo]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Heng]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[User Story Classification with Machine Learning and LLMs]]></article-title>
<source><![CDATA[Lecture Notes in Computer Science]]></source>
<year>2024</year>
<page-range>161-75</page-range><publisher-loc><![CDATA[Berlin, Germany ]]></publisher-loc>
<publisher-name><![CDATA[Springer Science and Business Media]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>[9]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hong]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression]]></article-title>
<source><![CDATA[Arxiv-Computation and Language]]></source>
<year>2024</year>
</nlm-citation>
</ref>
<ref id="B10">
<label>[10]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[TrustLLM: Trustworthiness in Large Language Models]]></article-title>
<source><![CDATA[Arxiv-Computation and Language]]></source>
<year>2024</year>
</nlm-citation>
</ref>
<ref id="B11">
<label>[11]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kumar]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Tiwari]]></surname>
<given-names><![CDATA[U. K.]]></given-names>
</name>
<name>
<surname><![CDATA[Dobhal]]></surname>
<given-names><![CDATA[D. C.]]></given-names>
</name>
</person-group>
<source><![CDATA[Classification of NFR based Importance Level of User Story in Agile Software Development]]></source>
<year>2023</year>
<conf-name><![CDATA[ 9thInternational Conference on Signal Processing, Communications and Computing]]></conf-name>
<conf-loc>India </conf-loc>
<page-range>264-8</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>[12]</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Liu]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering]]></source>
<year>2022</year>
<conf-name><![CDATA[ Conference on Empirical Methods in Natural Language Processing]]></conf-name>
<conf-loc>Abu Dhabi, United Arab Emirates </conf-loc>
<page-range>8938-58</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>[13]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dalpiaz]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Requirements data sets (user stories)]]></article-title>
<source><![CDATA[Mendeley Data]]></source>
<year>2018</year>
<volume>1</volume>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
