<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0120-5609</journal-id>
<journal-title><![CDATA[Ingeniería e Investigación]]></journal-title>
<abbrev-journal-title><![CDATA[Ing. Investig.]]></abbrev-journal-title>
<issn>0120-5609</issn>
<publisher>
<publisher-name><![CDATA[Facultad de Ingeniería, Universidad Nacional de Colombia.]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0120-56092012000100010</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Using hybrid associative classifier with translation (HACT) for studying imbalanced data sets]]></article-title>
<article-title xml:lang="es"><![CDATA[Estudio de conjuntos de datos desbalanceados usando un modelo asociativo con traslación de ejes]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Sánchez]]></surname>
<given-names><![CDATA[Laura Cleofas]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Guzmán Escobedo]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Valdovinos Rosas]]></surname>
<given-names><![CDATA[Rosa María]]></given-names>
</name>
<xref ref-type="aff" rid="A03"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Yáñez Márquez]]></surname>
<given-names><![CDATA[Cornelio]]></given-names>
</name>
<xref ref-type="aff" rid="A04"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Camacho Nieto]]></surname>
<given-names><![CDATA[Oscar]]></given-names>
</name>
<xref ref-type="aff" rid="A05"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Instituto Tecnológico de Toluca  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Instituto Técnico Superior de Hidalgo  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="A03">
<institution><![CDATA[,Universidad Autónoma del Estado de México  ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="A04">
<institution><![CDATA[,Institu-to Politécnico Nacional Centro de Investigación en Computación ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<aff id="A05">
<institution><![CDATA[,Instituto Politécnico Nacional Centro de Innovación y Desarrollo Tecnológico en Cómputo ]]></institution>
<addr-line><![CDATA[ ]]></addr-line>
<country>Mexico</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>01</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>01</month>
<year>2012</year>
</pub-date>
<volume>32</volume>
<numero>1</numero>
<fpage>53</fpage>
<lpage>57</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0120-56092012000100010&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0120-56092012000100010&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0120-56092012000100010&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[Class imbalance may reduce the classifier performance in several recognition pattern problems. Such negative effect is more notable with least represented class (minority class) Patterns. A strategy for handling this problem consisted of treating the classes included in this problem separately (majority and minority classes) to balance the data sets (DS). This paper has studied high sensitivity to class imbalance shown by an associative model of classification: hybrid associative classifier with translation (HACT); imbalanced DS impact on associative model performance was studied. The convenience of using sub-sampling methods for decreasing imbalanced negative effects on associative memories was analysed. This proposal's feasibility was based on experimental results obtained from eleven real-world datasets.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[En diversos problemas de reconocimiento de patrones, se ha observado que el desequilibrio de clases puede disminuir el desempeño del clasificador, principalmente en los patrones de las clases minoritarias. Una estrategia para resolver el problema del desbalance, consiste en tratar por separado las clases incluidas en el problema (clase minoritaria o mayoritaria), a fin de equilibrar los conjuntos de datos. En este sentido, la motivación del presente artículo estriba en el hecho de que el modelo asociativo visto como Clasificador Híbrido Asociativo con Traslación (CHAT), es muy sensible al desbalance de las clases. Por ello, se analiza el impacto que los conjuntos de datos desbalanceados pueden tener sobre el rendimiento del CHAT. Adicionalmente, se analiza la conveniencia de utilizar métodos de bajo-muestreo para disminuir los efectos negativos que el modelo asociativo pueda sufrir. La viabilidad de este estudio se sustenta con los resultados experimentales obtenidos de once conjuntos de datos reales. Finalmente, el presente trabajo se considera como una investigación analítica-sintética.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[data set]]></kwd>
<kwd lng="en"><![CDATA[associative model]]></kwd>
<kwd lng="en"><![CDATA[under sampling]]></kwd>
<kwd lng="en"><![CDATA[class imbalance]]></kwd>
<kwd lng="en"><![CDATA[pre-processing]]></kwd>
<kwd lng="es"><![CDATA[Modelo asociativo]]></kwd>
<kwd lng="es"><![CDATA[bajo-muestreo]]></kwd>
<kwd lng="es"><![CDATA[clase desbalanceada]]></kwd>
<kwd lng="es"><![CDATA[pre-procesamiento]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[  <font size="2" face="verdana">     <p align="center"><font size="4"><b>Using hybrid associative classifier with translation (HACT) for studying imbalanced data sets</b></font></p>      <p align="center"><font size="3"><b>Estudio de conjuntos de datos desbalanceados usando un modelo asociativo con traslaci&oacute;n de ejes</b> </font></p>     <p><b>Laura Cleofas S&aacute;nchez<sup>1</sup><i>, </i>M. Guzm&aacute;n Escobedo<sup>2</sup>, Rosa Mar&iacute;a Valdovinos Rosas<sup>3</sup>, Cornelio Y&aacute;&ntilde;ez M&aacute;rquez<sup>4</sup>, Oscar Camacho Nieto<sup>5</sup></b></p>      <p><sup>1</sup> PhD Candidate in Computer Sciences, Centro de Investigaci&oacute;n en Computaci&oacute;n, Mexico. Instituto Polit&eacute;cnico Nacional, MSc in Computer Sciences, Instituto Tecnol&oacute;gico de Toluca, Mexico. Centro de Investigaci&oacute;n en Computaci&oacute;n, Juan de Dios B&aacute;tiz s/n esq. Miguel Oth&oacute;n de Mendiz&aacute;bal, Unidad Profesional Adolfo L&oacute;pez Mateos, Del. Gustavo A. Madero, Mexico. E-mail:<a href="laura18cs77@gmail.com">laura18cs77@gmail.com</a>.</p>      <p><sup>2</sup> Computational Systems Engineer, Instituto T&eacute;cnico Superior de Hidalgo, Mexico. E-mail: <a href="E-mail:janyne20@hotmail.com">janyne20@hotmail.com</a>.</p>     <p><sup>3</sup> PhD in Computational Sciences, Universitat Jaume I, Spain. Universidad Aut&oacute;noma del Estado de M&eacute;xico, Centro Universitario Valle de Chalco, Mexico. E-mail: <a href="E-mail:li_rmvr@hotmail.com">li_rmvr@hotmail.com</a>.</p>     <p><sup>4</sup> PhD in Computational Sciences, Instituto Polit&eacute;cnico Nacional, Mexico. Instituto Polit&eacute;cnico Nacional, Centro de Investigaci&oacute;n en Computaci&oacute;n, Mexico. E-mail: <a href="E-mail:cyanez@cic.ipn.mx">cyanez@cic.ipn.mx</a>.</p>     <p><sup>5</sup> PhD in Computational Sciences, Instituto Polit&eacute;cnico Nacional, Mexico. Instituto Polit&eacute;cnico Nacional, Centro de Innovaci&oacute;n y Desarrollo Tecnol&oacute;gico en C&oacute;mputo, Mexico.  Corresponding author E-mail: <a href="E-mail:ocamacho@ipm.mx">ocamacho@ipm.mx</a>.</p>      <p><b>Received: August 19th 2011; Accepted: January 26th 2012</b></p>  <hr>    ]]></body>
<body><![CDATA[<p><b> RESUMEN </b></p>      <p>En diversos problemas de reconocimiento de patrones, se ha observado que el desequilibrio de clases puede disminuir el desempe&ntilde;o del clasificador, principalmente en los patrones de las clases minoritarias. Una estrategia para resolver el problema del desbalance, consiste en tratar por separado las clases incluidas en el problema (clase minoritaria o mayoritaria), a fin de equilibrar los conjuntos de datos. En este sentido, la motivaci&oacute;n del presente art&iacute;culo estriba en el hecho de que el modelo asociativo visto como Clasificador H&iacute;brido Asociativo con Traslaci&oacute;n (CHAT), es muy sensible al desbalance de las clases. Por ello, se analiza el impacto que los conjuntos de datos desbalanceados pueden tener sobre el rendimiento del CHAT. Adicionalmente, se analiza la conveniencia de utilizar m&eacute;todos de bajo-muestreo para disminuir los efectos negativos que el modelo asociativo pueda sufrir. La viabilidad de este estudio se sustenta con los resultados experimentales obtenidos de once conjuntos de datos reales. Finalmente, el presente trabajo se considera como una investigaci&oacute;n anal&iacute;tica-sint&eacute;tica. </p>     <p><b>Palabras clave</b>: Modelo asociativo, bajo-muestreo, clase desbalanceada, pre-procesamiento. </p>  <hr>    <p><b> ABSTRACT </b></p>      <p>Class imbalance may reduce the classifier performance in several recognition pattern problems. Such negative effect is more notable with least represented class (minority class) Patterns. A strategy for handling this problem consisted of treating the classes included in this problem separately (majority and minority classes) to balance the data sets (DS). This paper has studied high sensitivity to class imbalance shown by an associative model of classification: hybrid associative classifier with translation (HACT); imbalanced DS impact on associative model performance was studied. The convenience of using sub-sampling methods for decreasing imbalanced negative effects on associative memories was analysed. This proposal's feasibility was based on experimental results obtained from eleven real-world datasets. </p>     <p><b>Keywords</b>: data set, associative model, under sampling, class imbalance, pre-processing. </p> <hr>      <p><font size="3"><b>Introduction</b></font></p>      <p>Karl Steinbuch introduced the first associative model, called Lern-matrix, in 1961 (Santiago, 2003); it can be used as a binary pattern classifier. Various associative models have been developed since, for example the HACT, morphological and alpha beta models (Santiago, 2003). </p>     <p>Classifier performance is strongly related to two aspects in pattern recognition, regardless of application (Japkowicz, 2002; Huang <I>et al.</I>, 2006): a learning model used by the classifier and the quality of the data set (DS) used for training. Some inherent DS problems are imbalanced DS, redundant patterns, atypical and high dimension (Barandela <I>et al</I>., 2005). This paper is focused on the imbalance problem. </p>     <p>Imbalance occurs when one class (minority) is heavily under-represented compared to other classes (majority) (Weiss, 2004). Real cases (text categorisation, credit analysis) typically have few minority class samples (Tan, 2005; Huang <I>et al</I>., 2006). Low minority class representation complicates classifier learning (Weiss, 2004) and there is currently no universal solution for addressing such problem. Proposed solution strategies have included sampling (<I>over sampling </I>or <I>under sampling</I>) or adjusting the training algorithm (Barandela <I>et al</I>., 2005; Chawla <I>et al</I>., 2002). </p>     ]]></body>
<body><![CDATA[<p>This study analyses an associative model's (HACT) performance in imbalance involving two aspects: how model training is affected when using unbalanced DS and the desirability of using low DS sampling. </p>      <p><font size="3"><b>The imbalance problem</b></font></p>      <p>The negative effect of imbalance on classifier performance is basically due to the false assumption of balanced distribution of classes (Japkowicz, 2002; Huang <I>et al</I>., 2006). Research in this area can be categorised into three large groups: addressing  data (Japko-wicz, 2002) or algorithm imbalance (Ezawa <I>et al</I>., 1996), measuring classifier performance in unbalanced domains (Ranawana <I>et al</I>., 2006), (Daskalaki, 2006) and analysing the relationship between class imbalance and data complexity (Prati <I>et al</I>., (a) 2004; Prati <I>et al</I>., (b) 2004). </p>      <p><b><I>Data pre-processing</I></b></p>      <p>Typical sub-sampling proposals for solving class imbalance would include majority class under-sampling or minority class over-sampling; under sampling is aimed at striking a balance between classes by eliminating negative patterns and thus reducing majority class cardinality by using strategies such as random algorithms, cleaning, condensate and genetic algorithms (Barandela <I>et al</I>., 2005; Kuncheva <I>et al</I>., 1999), using unsupervised hierarchical algorithms (Cohen <I>et al</I>., 2006;  Batista <I>et al</I>., 2000). </p>     <p>Wilson editing (WE) is the most popular data cleaning algorithm (Wilson, 1972). The idea is to identify and remove noisy or atypical patterns, especially those in the overlap area between two or more classes. It involves applying the rule of <I>k</I> nearest neighbour rule (typically with <I>k </I>= 3) to estimate the class label corresponding to each pattern in the training set (TS) and eliminate patterns whose class label does not match the class for most of its <I>k </I>nearest neighbours. The WE method is expressed as follows: </p>      <p>Input: <I>TS </I>original, <I>x</I><I>i</I><I> </I>= training set patterns </p>     <p>Output: <I>S </I>= edited DS. </p>     <p>Begin </p>     <p>1. S = TS </p>     ]]></body>
<body><![CDATA[<p>2. For each <I>x</I><I>i</I><I> </I>in <I>TS </I>do </p>     <p>3. If <I>x</I><I>i</I><I> </I>is misclassified do //applying the nearest neighbour rule </p>     <p>4. Discard <I>x</I><I>i</I><I> </I>from <I>S </I></p>     <p>5. End If </p>     <p>6. End for </p>     <p>End </p>     <p>Condensate algorithms consider building a small TS representative group. The algorithm uses a type of pruning to remove patterns considered unnecessary (Hart, 1968). The resulting subset is called selective subset (SSM) and contains patterns nearest to a boundary decision considered prototypes of an original DS (Barandela <I>et al</I>., 2001). The method for a twoclass problem can be described as follows: </p>      <p>Input: <I>T </I>// original training set </p>      <p>Output: <I>SSM </I>selective subset of <I>T </I>	</p>      <p>Begin 1. <I>S </I>= <I>T</I>,  C = T </p>      ]]></body>
<body><![CDATA[<p>2. D<sub>i</sub> = mind (x<sub>i</sub>, x<sub>j</sub>), class(x<sub>i</sub>) &ne; class(x<sub>j</sub>), &forall;  x<sub>i</sub> x<sub>j</sub>&#8712;T </p>      <p>3. <I>y</I><sub>i</sub> = argmind (x<sub>i</sub>, y<sub>j</sub>), class(x<sub>i</sub>) &ne; class(y<sub>j</sub>) &forall; y<sub>j</sub><I> </I><I>&#8712;</I><I>T </I></p>      <p>4. vi<sub>i</sub> = <I>x<sub>j</sub></I><I>&#8712;</I><I>T |class</I>(<I>x<sub>i</sub></I>) = <I>class(x<sub>j</sub></I>)<I>, d(x<sub>i</sub>, x<sub>j</sub></i>) <I>&lt; d</I>(x<SUB>i</sub>, y<sub>i</sub>) </p>      <p>5. While <I>C &ne;</I> 0 do <I>x<sub>k</sub> </I>= <I>argmin D<sub>i</sub> C = C &minus; x<sub>k</sub> S<sub>k</sub> = x<sub>i</sub> &#8712; S|x<sub>k</sub>&#8712;v<sub>i</sub></I></p>     <p>6. IF <i>S<sub>k</sub></i> &cap;S&ne; 0 then <i>SSM = SSM</i> Ux<sub>k</sub> <I>S = S &minus; S<sub>k</sub></I> </p>     <p>7. End if </p>     <p>8. End while </p>     <p>End </p>      <p><font size="3"><b>Associative memories</b></font></p>      <p>An associative memory is constructed from a finite set of associations called a fundamental set, denoted as: </p>     ]]></body>
<body><![CDATA[<p align="left"><a name="ec1"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec1.jpg"></p>      <p>where <I>p </I>is fundamental set cardinality. </p>      <p>Associative models involve two phases (Aldape, 2007): learning and recalling. Associative memory is constructed during the learning phase making associations between input and output patterns while patterns learned during the learning phase are recalled during the recalling phase. </p>      <p><b><I>Hybrid associative classifier (HAC) </I></b></p>      <p>HAC is a classifier combining linear associator (learning phase) (Santiago, 2003) and Lernmatrix (recalling phase), eliminating each one's disadvantages. Input patterns must be binary (0 and 1) in the Lernmatrix model the input patterns are orthonormal in the linear associator model. HAC accepts real values in each input pattern component to solve this situation, as described in the following steps (Santiago, 2003): </p>      <p>1. Fundamental set input patterns are real values; they are integrated by <I>n </I>components and separated into C classes; </p>     <p>2. Output patterns are considered ''one hot'' vectors: <I>n &minus; th </I>out-put pattern component values are zeros, except in the component representing the class having a value of one; </p>     <p>3. The learning phase concerns the associative model linear associator, the sum of each fundamental set association's external products being found to obtain the memory: </p>     <p align="left"><a name="ec2"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec2.jpg"></p>      <p>4. Input pattern class is determined during the operation (Lernma-trix) phase. </p>      ]]></body>
<body><![CDATA[<p>HAC performance is affected when input patterns are grouped in the same quadrant, input pattern magnitudes greatly differ and the HAC tends to classify lesser magnitude patterns into pattern classes having greater magnitude. This situation leads to misclassification. </p>      <p><b><I>Hybrid associative classifier with translation (HACT) </I></b></p>      <p>HACT is an improved model of HAC associative memory in which translation axes solve some HAC difficulties (Santiago, 2003). <a href="#f1">Figure 1</a> (a) shows that input patterns are grouped in the same quadrant and new pattern sets become placed in different quadrants following translation (<a href="#f1">Fig. 1</a>b); such aspect strengthens HAC associative classification. </p>     <p>HACT considers the following steps for axis translation (Santiago, 2003): </p>     <blockquote>     <p>1) A mean vector is obtained from input patterns:     <p align="left"><a name="ec3"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec3.jpg"></p>      <p>2) Mean vector is taken as the new coordinate axis centre,; input and test patterns are thus translated:      <p align="left"><a name="ec4"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec4.jpg"></p>      <p>3) The linear associator's<I> </I>learning phase is carried out and </p>     ]]></body>
<body><![CDATA[<p>4) The Lernmatrix's<I> </I>recalling phase is performed.</p> </blockquote>      <p align="center"><a name="f1"></a><img src="../img/revistas/iei/v32n1/v32n1a10f1.jpg"></p>      <p><font size="3"><b>Methodology</b></font></p>      <p>This section explains the tools, methods and scenarios used in this study. </p>      <p><b><I>Data sets </I></b></p>      <p>The experimental results came from experiments involving 11 DSs with deferent classes (Cl) and features (Fe) obtained from www.ics.uci.edu/&tilde;mlearn (<a href="#t1">Table 1</a>). A 5<I>-fold cross-validation </I>was used for each DS. </p>     <p align="center"><a name="t1"></a><img src="../img/revistas/iei/v32n1/v32n1a10t1.jpg"></p>      <p>Several DS (having more than two classes) were transformed as a two-class problem for increasing imbalance level, as follows: </p> <ul>    <li>DS Glass: class 6 was the minority class (24 patterns) and remaining classes the majority class (150 patterns); </li>     <li>DS Vehic: class 1 was the minority class (170 patterns), remaining classes the majority class (508 patterns); and </li>     ]]></body>
<body><![CDATA[<li>DS Satim: class 4 was the minority class (500 patterns), remaining classes the majority class (4647 patterns).</li>     </ul>      <p><b><I>Classifier performance evaluation </I></b></p>      <p>Overall accuracy is usually used (eq. 5) for evaluating classifier performance regarding imbalance, assuming that the cost of error associated with each class is equal. This has been challenged as being unrealistic because a DS having severe imbalance usually has no uniform error cost. For example, in a hypothetical case involving only 0<I>.</I>2% positive pattern labelling, identifying all negative patterns may be 99<I>.</I>8% accurate overall but with the inconvenience that any positive pattern will also be identified. </p>    <p align="left"><a name="ec5"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec5.jpg"></p>      <p>where <i>n<sub>e</sub></i>  is the number of misclassified examples and <i>n<sub>t</sub></i>  is the total number of examples tested. </p>     <p>The geometric mean is commonly used as a criterion for determining classifier imbalance context performance (&Aacute;lvarez, 1994): </p>     <p align="left"><a name="ec6"></a><img src="../img/revistas/iei/v32n1/v32n1a10ec6.jpg"></p>      <p>where <img src="../img/revistas/iei/v32n1/v32n1a10s1.jpg"> is minority class accuracy and <img src="../img/revistas/iei/v32n1/v32n1a10s2.jpg"> is majority class accuracy. </p>     <p><b><I>Study scenarios </I></b></p>      ]]></body>
<body><![CDATA[<p>This study was aimed at analysing HACT associative model behaviour when working with imbalanced data. The study scenarios involved: </p>     <blockquote>    <p>1) E1 DS without pre-processing; </p>     <p>2) E2 DS edited (with WE, <I>k</I> = 3); </p>     <p>3) E3 DS condensate (with modified selective (SS); and </p>     <p>4) E4 DS edited and condensate (WE+SS, <I>k</I> = 3).</p> </blockquote>      <p><font size="3"><b>Results</b></font></p>      <p><a href="#t3">Table 3</a> shows resulting DS sizes after applying pre-processing algorithms. </p>     <p align="center"><a name="t3"></a><img src="../img/revistas/iei/v32n1/v32n1a10t3.jpg"></p>      <p><a href="#t3">Table 3</a> shows a considerable reduction in DS; for example, the condensate method of (E3) DS reduction was higher compared to the editing method (E2); however, DS reduction became much greater by combining both pre-processing methods (E4), </p>     ]]></body>
<body><![CDATA[<p>Classification was made after the reduction step. <a href="#t4">Table 4</a> shows the results (rounded up) as geometric mean, the original DS (E1) and associative memory trained with pre-processed DS. Values in brackets indicate standard deviation and values in bold indicate the best result for each DS. </p>    <p align="center"><a name="t4"></a><img src="../img/revistas/iei/v32n1/v32n1a10t4.jpg"></p>      <p><a href="#t4">Table 4</a> shows that HACT had better results (presented in E2) for all DS regarding low sampling strategies than non-pre-processed DS (E1), except for Glass and Satim DS. </p>     <p>Since HACT involves the spread of information DS, study stage performance may indicate that training HACT was highly susceptible to imbalanced DS or that HACT required the decision boundary to be well defined for proper performance, plus maintaining excessive pattern removal. The best performance obtained with HACT involved using the Wilson editing method (E2) as a low sampling strategy. </p>      <p><font size="3"><b>Conclusions</b></font></p>      <p>When DS are imbalanced HACT classifier performance becomes affected by not adequately recognising the patterns of the lesser classes represented. This study examined using two well-known algorithms for under sampling DS using associative models, intending to maintain or increase accuracy rates using eleven DS. </p>     <p>The results showed that using the WE algorithm tended to improve accuracy rates and reduced DS size as added value, resulting in computational cost reduction, for example, the Heart database (216 patterns) was reduced to 34 patterns. </p>     <p>It was proved that Wilson editing was the most conductive method for HACT performance, establishing an interesting situation. This clearly defined decision boundary and class density needed for good pattern conditions; an open study line was thus focused on using known over sampling algorithms for low density DS. </p>     <p>Further study with other filtering algorithms and incorporating cost-based functions to address imbalance represented an alternative for future study of imbalance without affecting class density (a priori) or probability. </p>      <p><font size="3"><b>Acknowledgements</b></font></p>      ]]></body>
<body><![CDATA[<p>This work was financed by UAEM project 3072/2011, Conacyt 239450, ICyTDF, PIFI 2010055, SIP (20100538, 20100554 and 20101709) and Instituto Polit&eacute;cnico Nacional COFAA. </p>     <p><font size="3"><b>References</b></font></p>     <!-- ref --><p>Aldape-P&eacute;rez, M., Implementaci&oacute;n de los modelos ALFA-BETA con l&oacute;gica reconfigurable., MSc Computer Engineering thesis (digital systems), Centro de Investigaci&oacute;n en Computaci&oacute;n, IPN, 2007. pp. 6-16. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000115&pid=S0120-5609201200010001000001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>&Aacute;lvarez, M., Estad&iacute;stica., ISBN 84-7485-327-3, Universidad de Deusto, Bilbao, 1994, pp.51-63. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000116&pid=S0120-5609201200010001000002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Barandela, R., Cort&eacute;s, N., Palacios, A., The nearest neighbour rule and the reduction of the training sample size., In Proceedings of the 9th Spanish Symposium on Pattern Recognition and Image Analysis, Universitat Jaume I, Benicasim, Spain, 2001, pp. 103-108. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000117&pid=S0120-5609201200010001000003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Barandela, R., Hern&aacute;ndez, J.K., S&aacute;nchez, J.S., Ferri, F.J., Imbalanced training set reduction and feature selection through genetic optimization., Proceeding of the 2005 conference on Artificial Intelligence Research and Development, ACM DL, Amsterdam, The Netherlands, 2005, pp. 215-222. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000118&pid=S0120-5609201200010001000004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Batista, G. E. A. P.A., Carvalho, A. C. P. L. F., Monard, M. C., Applying one-sided selection to unbalanced datasets., Lecture Notes in Artificial Intelligence, Vol. 1793, 2000, pp. 315-325. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000119&pid=S0120-5609201200010001000005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Cohen, G., Hilario, M., Hugonnet, S., Geissbuhler, A., Learning from Imbalanced data in surveillance of nosocomial infection. Artificial Intelligence in Medicine, ElSEVIER, Vol. 37, 2006, pp. 7-18. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000120&pid=S0120-5609201200010001000006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Chawla, V. N., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique., Journal of Artificial Intelligence Research, Vol. 16, 2002, pp. 321-357. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000121&pid=S0120-5609201200010001000007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Daskalaki, S., Kopanas, I., Avouris, N., Evaluation of classifiers for an uneven class distribution problem., Applied Artificial Intelligence, Vol. 20, 2006, pp. 1-37. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000122&pid=S0120-5609201200010001000008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Ezawa, K. J., Singh, M., Norton, S. W., Learning goal oriented Bayesian networks for telecommunications risk management., Machine Learning, Proceedings of the 13th International Conference, Ed. Morgan Kaufmann, Bari, Italy,1996, pp. 139-147. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000123&pid=S0120-5609201200010001000009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Hart, P. E., The condensed nearest neighbour rule., IEEE Transactions on Information Theory, Vol. 14, 1968, pp. 515-516. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000124&pid=S0120-5609201200010001000010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Huang, Y. M., Hung, C. M., Jiau, H. C., Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem, Nonlinear Analysis: Real World Applications., Vol. 7, 2006, pp. 720-757. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000125&pid=S0120-5609201200010001000011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Japkowicz, N., Stephen, S., The class imbalance problem: A systematic study, Intelligent Data Analysis., Vol. 6, 2002, pp. 429-449. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000126&pid=S0120-5609201200010001000012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Kuncheva, L. O., Jain, L. C., Nearest neighbour classifier: simultaneous editing and feature selection., Pattern Recognition Letters, Vol. 20, 1999, pp. 1149-1156. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000127&pid=S0120-5609201200010001000013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Prati, R. C., Batista, G. E. A. P. A., Monard, M. C., Class imbalance versus class overlapping: An analysis of a learning system behaviour., Lecture Notes in Computer Science, Vol. 2972, 2004a, pp. 312-321. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000128&pid=S0120-5609201200010001000014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Prati, R. C., Batista, G. E. A. P. A., Monard, M. C., Learning with class skews and small disjoints. Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, Ed. Springer, S&atilde;o Lu&iacute;s, Maranh&atilde;o - Brazil, 2004b, pp. 1119-1139. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000129&pid=S0120-5609201200010001000015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Ranawana, R., Palade, V., A new measure for classifier performance evaluation., Proceedings of IEEE Congress on Evolutionary Computation, IEEE, Vancouver, BC, 2006, pp. 2254-2261. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000130&pid=S0120-5609201200010001000016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Santiago, R., Clasificador h&iacute;brido de patrones basado en la Lern-matrix de Steinbuch y el linear associator de Anderson Kohonen., MSc Computer Science thesis Centro de Investigaci&oacute;n en Computaci&oacute;n, IPN, 2003. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000131&pid=S0120-5609201200010001000017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Tan, S., Neighbour-weighted Knearest neighbour for unbalanced text corpus, Expert Systems Applications., Vol. 28, 2005, pp. 667-671. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000132&pid=S0120-5609201200010001000018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Weiss, G. M., Mining with rarity: a unifying framework., ACM SIGKDD Explorations Newsletter, Vol. 6, 2004, pp. 7-19. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000133&pid=S0120-5609201200010001000019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p>Wilson, L., Asymptotic properties of nearest neighbour rules using edited data., IEEE Transactions on Systems, Man and Cybernetics, Vol. 2,1972, pp. 408-421. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000134&pid=S0120-5609201200010001000020&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p><font size="3"><b>Nomenclature</b></font></p>      <p><I>DS</I> Data set </p>     <p><I>E1</I> DS without pre-processing </p>     <p><I>E2</I> DS edited (with Wilson (WE)) </p>     <p><I>E3</I> DS condensate (with modified selective (SS)) </p>     <p><I>E4</I> DS edited and condensate (WE+SS). </p>     <p><I>HAC</I> Hybrid associative classifier </p>     <p><I>HACT</I> Hybrid associative classifier with translation </p>     ]]></body>
<body><![CDATA[<p><I>SS</I> Selective subset EPA, Title 40 Subchapter I-Solid waste, 258 criteria for municipal solid waste landfills, Environmental Protection Agency, USA, 2000. </p> </font>      ]]></body><back>
<ref-list>
<ref id="B1">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aldape-Pérez]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Implementación de los modelos ALFA-BETA con lógica reconfigurable]]></source>
<year></year>
<page-range>6-16</page-range></nlm-citation>
</ref>
<ref id="B2">
<nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Álvarez]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Estadística]]></source>
<year>1994</year>
<page-range>51-63</page-range><publisher-loc><![CDATA[Bilbao ]]></publisher-loc>
<publisher-name><![CDATA[Universidad de Deusto]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Barandela]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Cortés]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Palacios]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[The nearest neighbour rule and the reduction of the training sample size]]></source>
<year></year>
<conf-name><![CDATA[ Proceedings of the 9th Spanish Symposium on Pattern Recognition and Image Analysis]]></conf-name>
<conf-date>2001</conf-date>
<conf-loc>Benicasim </conf-loc>
<page-range>103-108</page-range></nlm-citation>
</ref>
<ref id="B4">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Barandela]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Hernández]]></surname>
<given-names><![CDATA[J.K]]></given-names>
</name>
<name>
<surname><![CDATA[Sánchez]]></surname>
<given-names><![CDATA[J.S]]></given-names>
</name>
<name>
<surname><![CDATA[Ferri]]></surname>
<given-names><![CDATA[F.J]]></given-names>
</name>
</person-group>
<source><![CDATA[Imbalanced training set reduction and feature selection through genetic optimization]]></source>
<year></year>
<conf-name><![CDATA[ Proceeding of the 2005 conference on Artificial Intelligence Research and Development]]></conf-name>
<conf-date>2005</conf-date>
<conf-loc>Amsterdam </conf-loc>
<page-range>215-222</page-range></nlm-citation>
</ref>
<ref id="B5">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Batista]]></surname>
<given-names><![CDATA[G. E. A. P.A]]></given-names>
</name>
<name>
<surname><![CDATA[Carvalho]]></surname>
<given-names><![CDATA[A. C. P. L. F]]></given-names>
</name>
<name>
<surname><![CDATA[Monard]]></surname>
<given-names><![CDATA[M. C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Applying one-sided selection to unbalanced datasets]]></article-title>
<source><![CDATA[Lecture Notes in Artificial Intelligence]]></source>
<year>2000</year>
<volume>1793</volume>
<page-range>315-325</page-range></nlm-citation>
</ref>
<ref id="B6">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Cohen]]></surname>
<given-names><![CDATA[G]]></given-names>
</name>
<name>
<surname><![CDATA[Hilario]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Hugonnet]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Geissbuhler]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Learning from Imbalanced data in surveillance of nosocomial infection]]></article-title>
<source><![CDATA[Artificial Intelligence in Medicine]]></source>
<year>2006</year>
<volume>37</volume>
<page-range>7-18</page-range><publisher-name><![CDATA[ElSEVIER]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chawla]]></surname>
<given-names><![CDATA[V. N]]></given-names>
</name>
<name>
<surname><![CDATA[Bowyer]]></surname>
<given-names><![CDATA[K. W]]></given-names>
</name>
<name>
<surname><![CDATA[Hall]]></surname>
<given-names><![CDATA[L. O]]></given-names>
</name>
<name>
<surname><![CDATA[Kegelmeyer]]></surname>
<given-names><![CDATA[W. P]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[SMOTE: Synthetic minority over-sampling technique]]></article-title>
<source><![CDATA[Journal of Artificial Intelligence Research]]></source>
<year>2002</year>
<volume>16</volume>
<page-range>321-357</page-range></nlm-citation>
</ref>
<ref id="B8">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Daskalaki]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Kopanas]]></surname>
<given-names><![CDATA[I]]></given-names>
</name>
<name>
<surname><![CDATA[Avouris]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evaluation of classifiers for an uneven class distribution problem]]></article-title>
<source><![CDATA[Applied Artificial Intelligence]]></source>
<year>2006</year>
<volume>20</volume>
<page-range>1-37</page-range></nlm-citation>
</ref>
<ref id="B9">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ezawa]]></surname>
<given-names><![CDATA[K. J]]></given-names>
</name>
<name>
<surname><![CDATA[Singh]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Norton]]></surname>
<given-names><![CDATA[S. W]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning goal oriented Bayesian networks for telecommunications risk management]]></source>
<year>1996</year>
<conf-name><![CDATA[ Machine Learning, Proceedings of the 13th International Conference]]></conf-name>
<conf-loc> </conf-loc>
<page-range>139-147</page-range><publisher-loc><![CDATA[Bari ]]></publisher-loc>
<publisher-name><![CDATA[Morgan Kaufmann]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Hart]]></surname>
<given-names><![CDATA[P. E]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The condensed nearest neighbour rule]]></article-title>
<source><![CDATA[IEEE Transactions on Information Theory]]></source>
<year>1968</year>
<volume>14</volume>
<page-range>515-516</page-range></nlm-citation>
</ref>
<ref id="B11">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Huang]]></surname>
<given-names><![CDATA[Y. M]]></given-names>
</name>
<name>
<surname><![CDATA[Hung]]></surname>
<given-names><![CDATA[C. M]]></given-names>
</name>
<name>
<surname><![CDATA[Jiau]]></surname>
<given-names><![CDATA[H. C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem, Nonlinear Analysis]]></article-title>
<source><![CDATA[Real World Applications]]></source>
<year>2006</year>
<volume>7</volume>
<page-range>720-757</page-range></nlm-citation>
</ref>
<ref id="B12">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Japkowicz]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Stephen]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The class imbalance problem: A systematic study]]></article-title>
<source><![CDATA[Intelligent Data Analysis]]></source>
<year>2002</year>
<volume>6</volume>
<page-range>429-449</page-range></nlm-citation>
</ref>
<ref id="B13">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kuncheva]]></surname>
<given-names><![CDATA[L. O]]></given-names>
</name>
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[L. C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Nearest neighbour classifier: simultaneous editing and feature selection]]></article-title>
<source><![CDATA[Pattern Recognition Letters]]></source>
<year>1999</year>
<volume>20</volume>
<page-range>1149-1156</page-range></nlm-citation>
</ref>
<ref id="B14">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Prati]]></surname>
<given-names><![CDATA[R. C]]></given-names>
</name>
<name>
<surname><![CDATA[Batista]]></surname>
<given-names><![CDATA[G. E. A. P. A]]></given-names>
</name>
<name>
<surname><![CDATA[Monard]]></surname>
<given-names><![CDATA[M. C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Class imbalance versus class overlapping: An analysis of a learning system behaviour]]></article-title>
<source><![CDATA[Lecture Notes in Computer Science]]></source>
<year>2004</year>
<month>a</month>
<volume>2972</volume>
<page-range>312-321</page-range></nlm-citation>
</ref>
<ref id="B15">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Prati]]></surname>
<given-names><![CDATA[R. C]]></given-names>
</name>
<name>
<surname><![CDATA[Batista]]></surname>
<given-names><![CDATA[G. E. A. P. A]]></given-names>
</name>
<name>
<surname><![CDATA[Monard]]></surname>
<given-names><![CDATA[M. C]]></given-names>
</name>
</person-group>
<source><![CDATA[Learning with class skews and small disjoints]]></source>
<year>2004</year>
<month>b</month>
<conf-name><![CDATA[ Proceedings of the 17th Brazilian Symposium on Artificial Intelligence]]></conf-name>
<conf-loc> </conf-loc>
<page-range>1119-1139</page-range><publisher-loc><![CDATA[São Luís^eMaranhão Maranhão]]></publisher-loc>
<publisher-name><![CDATA[Springer]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ranawana]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Palade]]></surname>
<given-names><![CDATA[V]]></given-names>
</name>
</person-group>
<source><![CDATA[A new measure for classifier performance evaluation]]></source>
<year></year>
<conf-name><![CDATA[ Proceedings of IEEE Congress on Evolutionary Computation]]></conf-name>
<conf-date>2006</conf-date>
<conf-loc>Vancouver BC</conf-loc>
<page-range>2254-2261</page-range></nlm-citation>
</ref>
<ref id="B17">
<nlm-citation citation-type="">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Santiago]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
</person-group>
<source><![CDATA[Clasificador híbrido de patrones basado en la Lern-matrix de Steinbuch y el linear associator de Anderson Kohonen]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B18">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Tan]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Neighbour-weighted Knearest neighbour for unbalanced text corpus]]></article-title>
<source><![CDATA[Expert Systems Applications]]></source>
<year>2005</year>
<volume>28</volume>
<page-range>667-671</page-range></nlm-citation>
</ref>
<ref id="B19">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Weiss]]></surname>
<given-names><![CDATA[G. M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Mining with rarity: a unifying framework]]></article-title>
<source><![CDATA[ACM SIGKDD Explorations Newsletter]]></source>
<year>2004</year>
<volume>6</volume>
<page-range>7-19</page-range></nlm-citation>
</ref>
<ref id="B20">
<nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Wilson]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Asymptotic properties of nearest neighbour rules using edited data]]></article-title>
<source><![CDATA[IEEE Transactions on Systems, Man and Cybernetics]]></source>
<year>1972</year>
<volume>2</volume>
<page-range>408-421</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
