<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0121-750X</journal-id>
<journal-title><![CDATA[Ingeniería]]></journal-title>
<abbrev-journal-title><![CDATA[ing.]]></abbrev-journal-title>
<issn>0121-750X</issn>
<publisher>
<publisher-name><![CDATA[Universidad Distrital Francisco José de Caldas]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0121-750X2022000300400</article-id>
<article-id pub-id-type="doi">10.14483/23448393.17952</article-id>
<title-group>
<article-title xml:lang="es"><![CDATA[Metodología para obtención y análisis de datos inmobiliarios usando fuentes alternativas: estudio de caso en tres ciudades intermedias de Colombia]]></article-title>
<article-title xml:lang="en"><![CDATA[Methodology for the Collection and Analysis of Real Estate Data Using Alternative Sources: Case Study in Three Medium-Sized Cities of Colombia]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Rosso-Mateus]]></surname>
<given-names><![CDATA[Andrés E.]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Montilla-Montilla]]></surname>
<given-names><![CDATA[Yeimy. M.]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Garzón-Martínez]]></surname>
<given-names><![CDATA[Sonia C.]]></given-names>
</name>
<xref ref-type="aff" rid="Aff"/>
</contrib>
</contrib-group>
<aff id="Af1">
<institution><![CDATA[,Universidad Nacional de Colombia  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af2">
<institution><![CDATA[,Universidad Nacional de Colombia  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="Af3">
<institution><![CDATA[,Universidad Distrital Francisco José de Caldas  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>12</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>12</month>
<year>2022</year>
</pub-date>
<volume>27</volume>
<numero>3</numero>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0121-750X2022000300400&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0121-750X2022000300400&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0121-750X2022000300400&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="es"><p><![CDATA[Resumen  Contexto:  La política pública de Catastro Multipropósito necesita consolidar información inmobiliaria de diferentes fuentes para su análisis, tales como ofertas, transacciones y costos de construcción, entre otros. Las páginas web inmobiliarias forman parte de estas fuentes de información, aunque no han sido incluidas en el análisis comercial. Considerando lo anterior, es necesario revisar una metodología que permita acceder de forma óptima a estas plataformas web y facilite el análisis de las variables que allí se proveen, que son determinantes para el valor comercial de un inmueble. Se realiza un caso de estudio en tres ciudades colombianas: Fusagasugá, Manizales y Villavicencio.  Método: El método se desarrolla en dos etapas (i) web scraping. que permite obtener los enlaces de la información de páginas web inmobiliarias y descargar sus datos, y (ii) el análisis de datos inmobiliarios mediante el desarrollo de un flujo de trabajo que inicia con la exploración y la limpieza de los datos, continúa con el pre-modelado y finaliza con el modelado de las variables de interés en la determinación del valor de los bienes inmuebles usando técnicas de machine learning.  Resultados:  A partir de la aplicación de técnicas de machine learning, fue posible automatizar la recolección, la limpieza, el almacenamiento y el análisis de datos inmobiliarios provenientes de plataformas web, así como delinear dos modelos (Ridge Regression y Random Forest) que, de acuerdo, con su error porcentual medio absoluto (0,34 y 0,35 respectivamente), permiten predecir el valor comercial de un inmueble considerando variables explicativas internas y externas.  Conclusiones:  Obtener y analizar los datos inmobiliarios de fuentes alternativas como las plataformas web a través de desarrollos tecnológicos contribuye significativamente a atender la alta demanda de información del catastro del país. No obstante, es necesario ampliar el suministro de esta información a los ámbitos rurales, que cuentan con menos acceso y disponibilidad de la misma.]]></p></abstract>
<abstract abstract-type="short" xml:lang="en"><p><![CDATA[Abstract  Context: The Multipurpose Cadastre public policy needs to consolidate real estate information from different sources for analysis, such as offers, transactions, and construction costs, among others. Real estate websites are part of these sources of information, although they have not yet been included in commercial analysis. In light of the above, it is necessary to review a methodology that allows optimal access to these web platforms and facilitates the analysis of the variables provided therein, which are crucial to a property&#8217;s commercial value. A study case was carried out in three Colombian cities: Fusagasugá, Manizales, and Villavicencio.  Method:  The method is implemented in two stages: (i) web scraping, which allows obtaining the information links from real estate web pages and downloading their data, and (ii) analyzing real estate data by developing a workflow that starts with data exploration and cleaning, continues with pre-modeling, and ends by modeling the crucial variables in the determination of real estate value using machine learning techniques.  Results:  By applying machine learning techniques, it was possible to automate the collection, cleaning, storage, and analysis of real estate data from web platforms, as well as to outline two models (Ridge Regression and Random Forest), which, according to their mean absolute percentage error (0,34 and 0,35, respectively), allow predicting the commercial value of a property while considering internal and external explanatory variables.  Conclusions: Obtaining and analyzing real estate data from alternative sources such as web platforms through machine learning techniques contributes significantly to addressing the high information de-mand of the country&#8217;s cadastre. However, it is necessary to expand the supply of this information to rural areas, which have less access and availability to it.]]></p></abstract>
<kwd-group>
<kwd lng="es"><![CDATA[Catastro Multipropósito]]></kwd>
<kwd lng="es"><![CDATA[dinámica inmobiliaria]]></kwd>
<kwd lng="es"><![CDATA[mercado inmobiliario]]></kwd>
<kwd lng="es"><![CDATA[valor comercial]]></kwd>
<kwd lng="es"><![CDATA[web scraping.]]></kwd>
<kwd lng="en"><![CDATA[Multipurpose Cadastre]]></kwd>
<kwd lng="en"><![CDATA[real estate dynamics]]></kwd>
<kwd lng="en"><![CDATA[Real Estate Market]]></kwd>
<kwd lng="en"><![CDATA[Commercial Value]]></kwd>
<kwd lng="en"><![CDATA[web scraping.]]></kwd>
</kwd-group>
</article-meta>
</front><back>
<ref-list>
<ref id="B1">
<label>[1]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ulbricht]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Scraping the demos. digitalization, web scraping and the democratic project,&#8221;]]></article-title>
<source><![CDATA[Democratization]]></source>
<year>2020</year>
<volume>27</volume>
<page-range>426-42</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>[2]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Uzun]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;A novel web scraping approach using the additional information obtained from web pages,&#8221;]]></article-title>
<source><![CDATA[IEEE Access]]></source>
<year>2020</year>
<volume>8</volume>
<page-range>61 726-40</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>[3]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bricongne]]></surname>
<given-names><![CDATA[J.-C.]]></given-names>
</name>
<name>
<surname><![CDATA[Meunier]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Sylvain]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Web scraping housing prices in real-time: the covid-19 crisis in the uk]]></article-title>
<source><![CDATA[SSRN Electr. Jour]]></source>
<year>2021</year>
</nlm-citation>
</ref>
<ref id="B4">
<label>[4]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hillen]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Web scraping for food price research]]></article-title>
<source><![CDATA[Br Food J]]></source>
<year>2019</year>
<volume>121</volume>
<page-range>3350-61</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>[5]</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Morshedi]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Chu]]></surname>
<given-names><![CDATA[B.]]></given-names>
</name>
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Web scraping: Applications in infrastructure planning,&#8221;]]></source>
<year>2019</year>
<publisher-name><![CDATA[New South Wales]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<label>[6]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dewi, Meiliana]]></surname>
<given-names><![CDATA[L. C.]]></given-names>
</name>
<name>
<surname><![CDATA[Chandra]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Social media web scraping using social media developers api and regex,&#8221;]]></article-title>
<source><![CDATA[Procedia Comput. Sci.]]></source>
<year>2019</year>
<volume>157</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>444-9</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>[7]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Krotov]]></surname>
<given-names><![CDATA[V.]]></given-names>
</name>
<name>
<surname><![CDATA[Johnson]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Silva]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Tutorial: Legality and ethics of web scraping,&#8221;]]></article-title>
<source><![CDATA[Faculty &amp; Staff Research and Creative Activity]]></source>
<year>2020</year>
<volume>47</volume>
<numero>12</numero>
<issue>12</issue>
<page-range>539-63</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>[8]</label><nlm-citation citation-type="">
<collab>DNP</collab>
<source><![CDATA[&#8220;Conpes 3958,&#8221;]]></source>
<year>2019</year>
<page-range>79</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>[9]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Eguino, Huáscar]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Erba]]></surname>
<given-names><![CDATA[Diego]]></given-names>
</name>
<name>
<surname><![CDATA[Da Silva]]></surname>
<given-names><![CDATA[Everton]]></given-names>
</name>
<name>
<surname><![CDATA[De Olivera]]></surname>
<given-names><![CDATA[Augusto]]></given-names>
</name>
<name>
<surname><![CDATA[Piumetto]]></surname>
<given-names><![CDATA[Mario]]></given-names>
</name>
<name>
<surname><![CDATA[Rodríguez Ramírez,]]></surname>
<given-names><![CDATA[Iturre,Teresa]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Catastro inmobiliaria y tributación municipal: Experiencias para mejorar su articulación y efectividad]]></article-title>
<source><![CDATA[Inter-American Development Bank]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B10">
<label>[10]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Saurkar]]></surname>
<given-names><![CDATA[A. V.]]></given-names>
</name>
<name>
<surname><![CDATA[Pathare]]></surname>
<given-names><![CDATA[K. G.]]></given-names>
</name>
<name>
<surname><![CDATA[Gode]]></surname>
<given-names><![CDATA[S. A.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;An overview on web scraping techniques and tools,&#8221;]]></article-title>
<source><![CDATA[Int. j. comput. sci. commun. eng.]]></source>
<year>2018</year>
<volume>4</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>363-7</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>[11]</label><nlm-citation citation-type="">
<collab>Alcaldía de Manizales</collab>
<source><![CDATA[&#8220;Informacion General - Alcaldía de Manizales,&#8221;]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B12">
<label>[12]</label><nlm-citation citation-type="">
<collab>Departamento Administrativo Nacional de Estad &#769;&#305;stica - DANE</collab>
<source><![CDATA[&#8220;¿Cuantos somos?&#8221;]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B13">
<label>[13]</label><nlm-citation citation-type="">
<collab>Instituto Geográfico Agustín Codazzi - IGAC</collab>
<source><![CDATA[&#8220;Datos Abiertos Catastro - GEOPORTAL,&#8221;]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B14">
<label>[14]</label><nlm-citation citation-type="">
<collab>Alcaldía de Villavicencio</collab>
<source><![CDATA[&#8220;Presentación,&#8221;]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B15">
<label>[15]</label><nlm-citation citation-type="">
<collab>Alcaldía de Fusagasugá</collab>
<source><![CDATA[&#8220;Presentación,&#8221;]]></source>
<year>2020</year>
</nlm-citation>
</ref>
<ref id="B16">
<label>[16]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Shafiee]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wautelet]]></surname>
<given-names><![CDATA[Y.]]></given-names>
</name>
<name>
<surname><![CDATA[Hvam]]></surname>
<given-names><![CDATA[L.]]></given-names>
</name>
<name>
<surname><![CDATA[Sandrin]]></surname>
<given-names><![CDATA[E.]]></given-names>
</name>
<name>
<surname><![CDATA[Forza]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Scrum versus Rational Unified Process in facing the main challenges of product configuration systems development,&#8221;]]></article-title>
<source><![CDATA[J. Syst. Softw]]></source>
<year>2020</year>
<volume>170</volume>
<page-range>110732</page-range></nlm-citation>
</ref>
<ref id="B17">
<label>[17]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Huber]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
<name>
<surname><![CDATA[Wiemer]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Schneider]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Ihlenfeldt]]></surname>
<given-names><![CDATA[S.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[DMME: Data mining methodology for engineering applications - A holistic extension to the CRISP-DM model]]></article-title>
<source><![CDATA[Procedia CIRP]]></source>
<year>2019</year>
<volume>79</volume>
<page-range>403-8</page-range><publisher-name><![CDATA[Elsevier B.V]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B18">
<label>[18]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Nylen]]></surname>
<given-names><![CDATA[E. L.]]></given-names>
</name>
<name>
<surname><![CDATA[Wallisch]]></surname>
<given-names><![CDATA[P.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Web Scraping,&#8221; in Neural Data Science]]></article-title>
<source><![CDATA[Elsevier]]></source>
<year>2017</year>
<page-range>277-88</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>[19]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Glez-Pena]]></surname>
<given-names><![CDATA[D.]]></given-names>
</name>
<name>
<surname><![CDATA[Lourenc &#807;o]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Lopez-Fernandez]]></surname>
<given-names><![CDATA[H.]]></given-names>
</name>
<name>
<surname><![CDATA[Reboiro-Jato]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<name>
<surname><![CDATA[Fdez-Riverola]]></surname>
<given-names><![CDATA[F.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[Web scraping technologies in an API world,&#8221;]]></article-title>
<source><![CDATA[Brief. Bioinformatics]]></source>
<year>2013</year>
<volume>15</volume>
<numero>5</numero>
<issue>5</issue>
<page-range>788-97</page-range></nlm-citation>
</ref>
<ref id="B20">
<label>[20]</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Baldominos]]></surname>
<given-names><![CDATA[A.]]></given-names>
</name>
<name>
<surname><![CDATA[Blanco]]></surname>
<given-names><![CDATA[I.]]></given-names>
</name>
<name>
<surname><![CDATA[Moreno]]></surname>
<given-names><![CDATA[A. J.]]></given-names>
</name>
<name>
<surname><![CDATA[Iturrarte]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Bernárdez]]></surname>
<given-names><![CDATA[Ó.]]></given-names>
</name>
<name>
<surname><![CDATA[Afonso]]></surname>
<given-names><![CDATA[C.]]></given-names>
</name>
</person-group>
<article-title xml:lang=""><![CDATA[&#8220;Identifying real estate opportunities using machine learning,&#8221;]]></article-title>
<source><![CDATA[Appl. Sci]]></source>
<year>2018</year>
<volume>8</volume>
<numero>11</numero>
<issue>11</issue>
<page-range>2321</page-range></nlm-citation>
</ref>
<ref id="B21">
<label>[21]</label><nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wirth]]></surname>
<given-names><![CDATA[R.]]></given-names>
</name>
<name>
<surname><![CDATA[Hipp]]></surname>
<given-names><![CDATA[J.]]></given-names>
</name>
</person-group>
<source><![CDATA[&#8220;Crisp-dm: Towards a standard process model for data mining,&#8221; in Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining]]></source>
<year>2000</year>
<volume>1</volume>
<publisher-loc><![CDATA[Springer-Verlag London, UK ]]></publisher-loc>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
