<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0120-6230</journal-id>
<journal-title><![CDATA[Revista Facultad de Ingeniería Universidad de Antioquia]]></journal-title>
<abbrev-journal-title><![CDATA[Rev.fac.ing.univ. Antioquia]]></abbrev-journal-title>
<issn>0120-6230</issn>
<publisher>
<publisher-name><![CDATA[Facultad de Ingeniería, Universidad de Antioquia]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0120-62302010000500016</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[A harmony search algorithm for clustering with feature selection]]></article-title>
<article-title xml:lang="es"><![CDATA[Un algoritmo de búsqueda armónica para clustering con selección de características]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Cobos]]></surname>
<given-names><![CDATA[Carlos]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[León]]></surname>
<given-names><![CDATA[Elizabeth]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Mendoza]]></surname>
<given-names><![CDATA[Martha]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,University of Cauca Electronic and Telecommunications Engineering Faculty Information Technology Research Group (GTI)]]></institution>
<addr-line><![CDATA[Popayán ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="A02">
<institution><![CDATA[,National University of Colombia Research Laboratory of Intelligent Systems LISI ]]></institution>
<addr-line><![CDATA[Bogotá ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>09</month>
<year>2010</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>09</month>
<year>2010</year>
</pub-date>
<numero>55</numero>
<fpage>153</fpage>
<lpage>164</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S0120-62302010000500016&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S0120-62302010000500016&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S0120-62302010000500016&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[This paper presents a new clustering algorithm, called IHSK, with feature selection in a linear order of complexity. The algorithm is based on the combination of the harmony search and K-means algorithms. Feature selection uses both the concept of variability and a heuristic method that penalizes the presence of dimensions with a low probability of contributing to the current solution. The algorithm was tested with sets of synthetic and real data, obtaining promising results.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[harmony search]]></kwd>
<kwd lng="en"><![CDATA[clustering]]></kwd>
<kwd lng="en"><![CDATA[feature selection]]></kwd>
<kwd lng="es"><![CDATA[búsqueda armónica]]></kwd>
<kwd lng="es"><![CDATA[agrupamiento]]></kwd>
<kwd lng="es"><![CDATA[selección de características]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <p align="center"><font face="Verdana" size="4"> <b>A harmony search algorithm for clustering with feature selection</b></font></p>      <p align="center"><font face="Verdana" size="4"> <b>Un algoritmo de búsqueda armónica para clustering con selección de características</b></font></p>       <p> <font face="Verdana" size="2"><i> Carlos Cobos <sup>1,2 *</sup>, Elizabeth León<sup>2</sup>, Martha Mendoza<sup>1</sup></i> </font></p>      <p><font face="Verdana" size="2"><sup>1</sup> Information Technology Research Group (GTI), Electronic and Telecommunications Engineering Faculty, University of Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia.    <br>         <br> <sup>2</sup> Research Laboratory of Intelligent Systems (LISI), National University of Colombia, Bogotá, Colombia. </font></p>      <p>&nbsp;</p>   <hr noshade size="1">      <p><font face="Verdana" size="3"> <b>Abstract</b></font></p>      <p><font face="Verdana" size="2">This paper presents a new  clustering algorithm, called IHSK, with feature selection in a linear order of  complexity. The algorithm is based on the combination of the harmony search and  K-means algorithms. Feature selection uses both the concept of variability and  a heuristic method that penalizes the presence of dimensions with a low  probability of contributing to the current solution. The algorithm was tested  with sets of synthetic and real data, obtaining promising results.</font></p>      <p><font face="Verdana" size="2"><b>Keywords: </b>harmony search, clustering, feature selection </font></p>  <hr noshade size="1">      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="3"> <b>Resumen:</b></font></p>      <p><font face="Verdana" size="2">En este  art&iacute;culo se presenta un nuevo algoritmo de clustering denominado IHSK, con la  capacidad de seleccionar caracter&iacute;sticas en un orden de complejidad lineal. El  algoritmo es inspirado en la combinaci&oacute;n de los algoritmos de b&uacute;squeda arm&oacute;nica  y K-means. Para la selecci&oacute;n de las caracter&iacute;sticas se us&oacute; el concepto de  variabilidad y un m&eacute;todo heur&iacute;stico que penaliza la presencia de dimensiones  con baja probabilidad de aportar en la soluci&oacute;n actual. El algoritmo fue  probado con conjuntos de datos sint&eacute;ticos y reales, obteniendo resultados  prometedores. </font></p>      <p><font face="Verdana" size="2"><b>palabras clave: </b>b&uacute;squeda arm&oacute;nica,  agrupamiento, selecci&oacute;n de caracter&iacute;sticas</font></p>  <hr noshade size="1">      <p><font face="Verdana" size="3"><b>Introduction</b></font>      <p><font face="Verdana" size="2">Clustering is the process of partitioning a set of objects  into an  a priori  unknown number of clusters (or groups) while minimizing the within- cluster  variability and maximizing the between cluster variability. Clustering is a  challenging task in unsupervised learning. It has been used in many engineering  and scientific disciplines such as computer vision (e.g. image segmentation),  information retrieval (clustering web documents), biology (clustering of genome  data) and market research (market segmentation and data forecasting). Several  general clustering algorithm categories or approaches have been proposed in the  literature, including: hierarchical, partitional, density-based and grid-based  algorithms [1, 2]. Partitional clustering has long been the most popular,  because it is dynamic, has good performance and it considers the global shape  and size of clusters. In partitional clustering, each data object is  represented by a vector of features. The algorithm organizes the objects into K  clusters in such a way that the total deviation of each cluster is minimized  and the clusters centers are far away from each other. The deviation between  two points can be computed separately using similarity or distance functions.  Most partitional algorithms (e.g. K-means, k-medoids) assume all features to be  equally important for clustering, but this approach can create some  difficulties because in reality some features may be redundant, others may be  irrelevant, and some can even mislead the clustering process. Feature Selection  (FS) is the task of selecting the best feature subset in a high- dimensional  data set [3]. FS is a very important task in clustering, because it can improve  the performance of the clustering algorithm and can contribute to the  interpretability of the models generated. FS is usually done before the  clustering process in algorithms commonly referred to as filters, but recently,  there have been some algorithms (called wrappers) that combine FS  simultaneously with the clustering process [3]. In this paper, we have put  forward a new partitional algorithm for clustering with FS called IHSK. This  algorithm is based on the harmony search (HS) [4&shy;6] and K-means algorithms. HS  is used as a global approach to optimize solutions of K-means (best local  solutions) in which FS based on variance analysis is done.</font>    <br></p>       <p><font face="Verdana" size="2"><b><i>Harmony search algorithm</i></b></font></p> <font face="Verdana" size="2">HS is a meta-heuristic  algorithm mimicking the improvisation process of musicians (where music players  improvise the pitches of their instruments to obtain better harmony) [4-6]. HS  has been successfully applied to many optimization problems (e.g. travelling salesman  problem, chaotic systems). The steps in the procedure of HS are as follows  [4-6]: <ol>   <ol>         <li>Initialize the  Problem and Algorithm Parameters: The optimization problem is defined as minimize (or  maximize)  f (x) subject  to  x<sub>i</sub> e X, i  = 1,2...,  N, where f (x) is the objective  function,  x is the set  of each decision variable x<sub>i</sub>, N is the number of decision variables, X<sub>i</sub> is the set of the possible  range of values for each decision variable, that is , <sub>L</sub>x<sub>i</sub>  &le;X<sub>I</sub> &le; Ux<sub>i</sub> and Lx<sub>i</sub> and Ux<sub>i</sub> are the lower and upper  bounds for each decision variable. In addition, the parameters of the HS are  specified in this step. These parameters are the Harmony Memory Size (HMS, a  typical value is between 4 and 10), Harmony Memory Considering Rate (HMCR, a  typical value is 0.95), Pitch Adjusting Rate (PAR, a typical value is between  0.3 and 0.99), distance BandWidth (BW, the amount of change for pitch  adjustments) and the Number of Improvisations (NI) or stopping criterion [4-6].</li>    <br>         <li>Initialize the Harmony Memory: The Harmony Memory (HM) is a  memory location where all the solution vectors (sets of decision variables) are  stored. The initial HS is generated from a uniform distribution in the ranges <sub>L</sub>x<sub>i</sub> and Uxi,  where 1&le; i &le; N. This step is carried out as  follows:  x<sub>i</sub> <sup>j</sup> =<sub>L</sub>x<sub>t</sub> + Randx (<sub>U</sub>x<sub>i</sub> - <sub>L</sub>x<sub>i</sub>),  where j  = 1,2... HMS; and Rand is a uniformly  distributed random number between 0 and 1 (Rand ~ U(0,1).       ]]></body>
<body><![CDATA[</ol>     </ol> <ol>   <ol>         <li>Improvise a New Harmony: Generating a new harmony is  called improvisation. A new harmony vector, x<sup>T</sup> =  (x<sub>1</sub><sup> T</sup>, x<sub>2</sub><sup> T</sup> , . . .&nbsp;&nbsp; &nbsp;&nbsp;x<sup>T</sup><sub>N</sub>), is generated based on three  rules: memory consideration, pitch adjustment and random selection. In this  step, HM consideration, pitch adjustment or random selection is applied to each  variable of the New Harmony vector in turn.</li>         <li>Update the Harmony Memory: The New Harmony vector, xT =  (x1 T, x2 T , . . .&nbsp;&nbsp; &nbsp;&nbsp;xTN)  replaces  the worst harmony vector in the HM, if its fitness (judged in terms of the  objective function value) is better than the second one. The New Harmony vector  is included in the HM and the existing worst harmony vector is excluded from  the HM.</li>         <li>Check the Stopping Criterion: If the stopping criterion  (e.g. maximum NI) is satisfied, computation is terminated. Otherwise, Steps 3  and 4 are repeated.</li>       </ol>     </ol> </font>     <p><font face="Verdana" size="2">The HMCR and PAR parameters of the HS help the method in  searching for globally and locally improved solutions, respectively. PAR and BW  have a profound effect on the performance of the HS algorithm. Thus, fine  tuning these two parameters is very important. From these two parameters, BW is  more difficult to tune because it can take any value from (0, &lt;x&gt;).</font>   </p>        <p><font face="Verdana" size="2"><b><i>The K-Means clustering algorithm</i></b></font></p>     <p><font face="Verdana" size="2">The K-means is a partitioning  clustering algorithm. The K-means algorithm is the simplest and most commonly  used algorithm employing a Sum of Squared Error (SSE) criterion. This algorithm  is popular because it finds a local minimum (or maximum) in a search space, it  is easy to implement, and its time complexity is O(n), where n is the number of  objects (registers or patterns). Unfortunately, the quality of the result is  dependent on the initial points and may converge to a local minimum of the  criterion function value if the initial partition is not properly chosen [1,2].  K-means inputs are: The number of clusters (K value) and a set (table, array or  collection) containing n objects (or registers) in a D-dimensionality feature  space, formality defined by X = {x<sub>1</sub>, x<sub>2</sub>,...,x<sub>n</sub>} (In our case, x<sub>i</sub> is a row vector, for  implementation reasons). K-means outputs are a set containing K centers. The  steps in the procedure of K-means can be summarized as shown in Table 1.</font></p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><b>Table 1</b> The K-means algorithm</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t01.gif"><a name="tabla1"></a></p>     <p><font face="Verdana" size="2">Select An Initial  Partition:  Arbitrarily choose K centers as the initial solution (for example Forgy  suggested selecting these K instances randomly from the data set [7]). These K  centers are defined as C = {c<sub>1</sub>, c<sub>2</sub>,... c<sub>k</sub>}, and each c<sub>j</sub> is an D-dimensionailty row  vector.    <br>   Re-compute Membership: For all objects in a data set  it is necessary to recompute membership according to the current solution.  Several similarity or distance measurements can be used. In this work, we used  Euclidian distance formality defined as (1).</font></p>          <p><img src="/img/revistas/rfiua/n55/n55a16e01.gif"><a name="ecuacion1"></a></p>       <p><font face="Verdana" size="2">Each object is assigned to a  specific cluster. This assignment is hard or soft. In our case, the assignment  is hard, which is defined by Pi,j equal to 1 if x<sub>i</sub> e c<sub>j</sub>  &nbsp;otherwise is equal to 0.    <br> Update Centers: For some/all clusters in the  current solution it is necessary to update centers according to new memberships  of the objects. Normally, the cluster center is the mean (average) point  (formula 2) of all objects in the cluster, where n. is the number of objects in  cluster j.</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e02.gif"><a name="ecuacion2"></a></p>     <p><font face="Verdana" size="2">Until (Stop Criterion): stop if for example, there is  no (or minimal) reassignment of patterns to new cluster centers, or there is a  minimal decrease in a SSE. The criterion mostly used to distinguish the  convergence and to characterize good clusters is based on (3).</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e03.gif"><a name="ecuacion3"></a></p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2">Return Solution: return K actual centers C = {c<sub>1</sub>, c<sub>2</sub>...  c<sub>k</sub>}.    <br> In the literature, various criteria have been used to  compare two or more solutions to decide which one is better [8, 9]. The most  popular criteria are based on the within-cluster (Sw defined by 4)  and between-cluster (S<sub>b</sub> defined by 5) scatter matrices. One criteria is the Tmce (S<sub>w</sub><sup>-l</sup>S<sub>b</sub>). Hence, large values of the  criterion correspond to high-quality clustering solutions. This criterion is  invariant under any non-singular linear transformation [3] and has been widely  used for clustering, where issues  such as FS and the number of clusters do not arise.</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e04.gif"><a name="ecuacion4"></a></p>     <p><font face="Verdana" size="2">To calculate S it is necessary  to calculate the covariance matrix of features selected. When the variance of a  feature is zero or near to zero, that feature is removed from the space of  solutions.</font></p>      <p><font face="Verdana" size="2"><b><i>Iterative harmony search K-means algorithm with feature selection</i></b></font></p>     <p><font face="Verdana" size="2">Our  algorithm, called Iterative Harmony Search K-means Algorithm (IHSK) uses the HS  algorithm as a global search strategy across the whole solution space, and the  K-means algorithm as a local strategy for improving solutions. In IHSK, each  solution vector used in the HS algorithm has different features, and the  objective function of the HS algorithm depends on the location of the centroids  in each vector solution and the variability of features selected.</font></p>       <p><font face="Verdana" size="2"><b><i>Quantitative index for feature selection</i></b></font></p>     <p><font face="Verdana" size="2">Selecting the relevant  features in a clustering problem is a key aspect for improving solutions. From  figure 1, we can understand the importance of selecting relevant features. This  figure shows a data set with two evident clusters. Feature 1 gives us relevant  information to determine two clusters (project data in the F1 and F2 axes), but  feature 3 does not (if we project data in the F3 axis, just one cluster  appears) so, in this case F3 is an irrelevant feature.</font></p>      <p align="center"><img src="/img/revistas/rfiua/n55/n55a16i01.gif"><a name="figura1"></a></p>      <p><font face="Verdana" size="2"><b>Figure 1</b> F1 and F2 are relevant features, while F3 is irrelevant</font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2">In the literature, FS adopts two kinds of methods [10]:  filters and wrappers. Filters preselect the features and then the clustering  algorithm works with the selected feature subset. In other words, filters run  before and independently from the clustering process [10]. Wrappers involve a  clustering algorithm such as K-means, Expectation-Maximization or K-medoids  running on a feature subset, with the feature subset being assessed by the  clustering algorithm through an appropriate index [3, 10], in our case, the  variance of features. Wrappers can offer a better performance, depending on the  incorporated clustering algorithm [11].    <br> IHSK makes FS an integral part of the global clustering  search procedure and attempts to identify high-quality solutions for clustering  and FS. Similar to Zeng and Cheung in [12], we determine that a feature is less  relevant if the variance of observations in a cluster is closer to the global  variance of observations in all clusters. Subsequently, we use the following  quantitative index to measure the relevance of each feature:</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e06.gif"><a name="ecuacion6"></a></p>     <p><font face="Verdana" size="2">In (6), K is the number of clusters. VarianceFj. is the variance of the j-th  cluster projected on the F-th dimension (remember, we are using a data  set/table/matrix in a D-dimensionality feature space) and Variance<sub>F</sub> is the variance of the F-th  dimension.</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e07.gif"><a name="ecuacion7"></a></p>     <p><font face="Verdana" size="2">In (7), Nj is the number of data in the j-th cluster, &micro;<sub>FJ</sub>  is the mean (average) of the F-th feature in the j-th cluster, x<sub>Fz</sub>  correspond to all values in F-th feature of data in the j-th cluster.</font></p>      <p><img src="/img/revistas/rfiua/n55/n55a16e08.gif"><a name="ecuacion8"></a></p>     <p><font face="Verdana" size="2">In (8), N is the total number of data, mF is the mean (average)  of the F-th feature, xFz corresponds to all values in F-th feature.    <br> The Score<sub>FJ</sub>  &nbsp;indicates the relevance of the F-th feature  for the j-th cluster. The Score. indicates the average relevance of the F-th feature to the  clustering structure. If Score<sub>F</sub> is close to the maximum  value, then, all clusters in the current solution are far away from each other  on this dimension and hence this feature is very useful for detecting the  grouping structure. Otherwise, the Score F will be close to  the minimum value. Unlike Zeng and Cheung in  [12], we did not use a feature's Markov Blanket to select the   appropriate dimensions. We  defined a penalty value for the current solution (current selected   features) based on the way how  Lingo [13] uses the Candidate Label Threshold parameter in the matrix  factorization step with Singular Value Decomposition (SVD). Our Heuristic  method   is based on Score<sub>F</sub> values and a new parameter called  Percentage of Dimensions (FI). First, we   calculate the sum of scores in each dimension,    SS <img width="102" height="22" src="n55a16_clip_image002_0000.gif">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Then, we  organize all Score  F values (where F = 1,...,d and d &lt; D) in descending order. Next, we  iterate and accumulate each ScoreF value until the FI parameter  is reached. The number of iterations before reaching the FI parameter is called  Number of Relevant Dimensions (NRD). Finally, Penalty for the current solution  is equal to (9). For us, when the FI parameter is high (50% or more) we promote  lower dimensionality solutions, but if the FI parameter is low, we promote high  dimensionality solutions (with 0% the algorithm does not do FS).</font></p>        <p><img src="/img/revistas/rfiua/n55/n55a16e09.gif"><a name="ecuacion9"></a></p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><b><i>Description of Iterative harmony search K-means algorithm</i></b></font></p>     <p><font face="Verdana" size="2">IHSK has a main routine that performs three basic steps.  These steps are: initialize the algorithm parameters; initialize the best  memory results and call the HSK routine in several iterations; and finally,  return the best result. Below, we present these steps in more depth.    <br>    <br> 1. Initialize the  algorithm parameters: in our case, the optimization problem is defined as  maximize the product of Trace (S<sub>w</sub><sup>-l</sup>S<sub>b</sub>) and  a    <br> Penalty function (dependent on FSs), called Fitness function. IHSK needs three  specific parameters - Best Memory Results Size (BMRS), Number of clusters  desired (K), and Percentage of Dimensions (FI) - as well as other parameters  from the HS Algorithm (HMS, HMCR, PAR, BW and NI).    <br>    <br> 2. Initialize the  best memory results and call the HSK routine: best memory results (BMR) is  a memory location where the best solution vectors are stored. Each row in BMR  stores the result of one call to the Harmony Search K-means (HSK) routine, in a  basic cycle. Each row vector in BMR has three parts: centroids, a list of  dimensions selected and the fitness value of that vector.</font></p>        <p><img width="582" height="148" src="n55a16_clip_image002_0001.gif" align="center">&nbsp; </p>       <p><font face="Verdana" size="2">Before starting the process,  we calculate the range of each dimension and store these results in a memory  location called &quot;Range of Dimensions&quot;. Also, we remove decision  variables with range equal to zero (0) and transform the data with a Min-Max  Normalization [14]. Other tasks of data preprocessing are responsibility of the  research person. This step can be summarized as shown in Table 2.    <br> 3. Return the best  result:  find and select the best result from the Best Memory Results (BMR). The best  result is the row with the highest fitness value (maximize f (x)). Then return this row as  the best clustering solution (centroids, list of dimensions selected and  fitness).    ]]></body>
<body><![CDATA[<br> The HSK routine is the HS  algorithm with some changes, which works as follows:     <ul>         <li>Initialize the  Harmony Memory:  The HM is a memory location where all the solution vectors are stored. Each  vector solution is created with a random number of dimensions (d &lt; D),  initial location of centroids (k centroids with Forgy strategy and values in  all dimensions) and fitness for this solution. The initial centroids are  selected randomly from the original data set (unlike in the original HS  algorithm). The general structure of HM is similar to BMR. In this step, we  generate HMS vector solutions and then calculate the fitness value for each  vector.</li>         <li>Improvise a New Harmony: A new harmony vector is generated.  We use a variation of step 3 in the original HS algorithm to create centroids  (each dimension value in each centroid) in the current solution. The random  selection process is executed from the original data set (Forgy strategy) (see  table 3). Next, we execute one cycle of the K-means algorithm (Algorithm in  table 1 steps 3 and 4) and then calculate the fitness value for this solution</li>       </ul></font></p>          <p><font face="Verdana" size="2"><b>Table 2</b> Initialize the best memory results and call the HSK routine</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t02.gif"><a name="tabla2"></a></p>      <p><font face="Verdana" size="2"><b>Table 3</b> Improvisation of a New Harmony</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t03.gif"><a name="tabla3"></a></p>     <p><font face="Verdana" size="2">    <ul>    <li>Update  the Harmony Memory: The New Harmony vector replaces the worst harmony vector in  the HM, if its fitness value is better than this latter.</li>              ]]></body>
<body><![CDATA[<li>Check  the Stopping criterion: If the maximum number of improvisations (NI) is  satisfied, iteration is terminated. Otherwise, Steps 2 and 3 are repeated.</li>             <li>Select the Best Harmony in HM: We find and select the best  harmony, which has the maximum fitness value. Then, we execute the K-means  algorithm (Algorithm in Table 1 without step 1, because this solution has  information about initial centroids, number of clusters and list of dimensions  selected) and then, we calculate a new value of fitness with the final location  of centroids.</li>             <li>Return the Best Result in Harmony Memory: Return the best harmony  (centroids, list of dimensions selected and fitness) to IHSK.</li>    </ul>   To calculate fitness value, we  use a function shown in table 4.</font></p>      <p><font face="Verdana" size="2"><b>Table 4</b> Routine for calculating fitness value</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t04.gif"><a name="tabla4"></a></p>     <p><font face="Verdana" size="2">The  HSK routine can be summarized as shown in table 5.</font></p>       <p><font face="Verdana" size="2"><b>Table 5</b> Steps in the Harmony Search K-means Routine (HSK)</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t05.gif"><a name="tabla5"></a></p>      <p><font face="Verdana" size="2"><b><i>Complexity</i></b></font></p>     ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2">IHSK repeats the HSK routine  BMRS times and then carries out a sorting of a vector with BMRS rows. The major  computational load occurs in each step of the HSK routine. The HSK routine  generates HMS solution vectors and then NI improvisations. For each vector  solution generated in the HSK routine, we need to process the variance  assessment for FS, Trace (S<sub>w</sub><sup>-1</sup>S<sub>b</sub> ) calculation and one step of  the K-means algorithm. Finally, HSK routine finds and selects the best  solution, performs the K-means algorithm for this solution and re-calculates  the fitness value (variance and trace). The variance assessment takesO(n *D)times.  The  Trace (S<sub>w</sub><sup>-1</sup>S<sub>b</sub>) calculation and one-step of  the K-Means algorithm of a given solution take O(n*K*D) and O(n*K*D) times, respectively. The  total K-means algorithm and re&shy;calculation of the fitness value take O(n*K*D*L) (where L is the number of  iterations taken by the K-means algorithm to converge) times. Therefore, the  overall complexity of the proposed algorithm is  <b>0{n*K*D*(L + HMS+ Nl)* BMRS}</b></font></p>      <br>    <p><font face="Verdana" size="2"><b>Experimental</b></font></p>      <p><font face="Verdana" size="2"><b><i>Data sets</i></b></font></p>     <p><font face="Verdana" size="2">Several data sets (three  synthetic and three real), have been used in our experiments. The synthetic  data sets were generated with different numbers of clusters and noise. The  synthetic data sets contain &quot;relevant&quot; and &quot;irrelevant&quot;  features. &quot;Irrelevant&quot; features are generated as Gaussian normal  random variables. Table 6 shows the description of synthetic data sets.</font></p>      <p><font face="Verdana" size="2"><b>Table 6</b> Description of synthetic data used in our experiments</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t06.gif"><a name="tabla6"></a></p>     <p><font face="Verdana" size="2">Three real data sets were  considered, they are: Iris, the Wisconsin diagnostic breast cancer (WDBC), and  image segmentation. They are taken from the UCI Machine Learning Repository.  Table 7 shows the description of real data sets. Since we are concerned with  unsupervised learning, the class labels in these data sets are used only for  evaluation of the clustering results.</font></p>      <p><font face="Verdana" size="2"><b>Table 7</b> Description of real data used in our experiments</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t07.gif"><a name="tabla7"></a></p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><b><i>IHSK parameters and measures</i></b></font></p>     <p><font face="Verdana" size="2">All parameter values were  equal for all data sets. BRMS equal to 10, HMS equal to 25, HMCR equal to 0.95,  PAR equal to 0.35, BW equal to 0.0005 and NI equal to 500. K value in each data  set was fixed to 3, 4, 5, 3, 2 and 7 respectively. FI was set to 0.3 in the  first experiments, and then FI was changed.    <br> In  our experiments, we try to solve the following questions: Is the number of  clusters correctly identified? and Is the selected feature subset relevant? To  answer the first question, we compute Error Classification Percentage (ECP),  since we know the &quot;true&quot; clusters or labels of the synthetic and the  real data sets. To answer the second question, we use Recall and Precision  concepts from the information retrieval field research [14]. In our case, the  feature recall (FR) and feature precision (FP) are reported on synthetic data,  since the relevant features are known a priori. FR is the number of relevant  features in the selected subset divided by the total number of relevant  features and FP is the number of relevant features in the selected subset  divided by the total number of features selected. High values of FR and FP are  desired. This second question cannot be answered for real data because the  relevant features are unknown; in this case, we show only the Number of  Features Selected (NFS).</font></p>      <p><font face="Verdana" size="2"><b>Results</b></font></p>     <p><font face="Verdana" size="2">First we conducted a  set of experiments on both synthetic and real data to evaluate the proposed  algorithm, comparing this with the standard K-means algorithm. We ran the  algorithm 10 times and calculated the average to show them as results; these  promising results are shown in table 8. IHSK has better results of ECP for both  real and synthetic data sets. IHSK is effective in trying to select the  relevant features because the results of the NFS are good in synthetics data.  Also, FR is higher or equal to 95% and FP is higher than 87%. For synthetic  data sets, the K-means algorithm was executed for all features and for the  relevant features (K-means-F), but IHSK presented better results.</font></p>      <p><font face="Verdana" size="2"><b>Table 8</b> ECP, NFS, Feature Recall (FR) and Feature Precision (FP) by the algorithms</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t08.gif"><a name="tabla8"></a></p>     <p><font face="Verdana" size="2">Next, we analyze the FI  parameter using the WDBC data set. We ran IHSK with FI equal to 0.1, 0.2, 0.3,  0.4, 0.5, 0.6, 0.7, 0.8 and 0.9. Figure 2 shows the NFS (dash line with  triangles) and ECP (line with dots) in IHSK with the different values of FI. In  this figure, we can see that if the FI parameter is high we promote lower  dimensionality solutions, but if the FI parameter is low, we promote high  dimensionality solutions. We can also see that for the WDBC data set, the best  solution is with FI equal to 0.8, because the ECP is 7.98% and the NFS is 7.7.  We cannot say that high values of FI parameter promise better solutions,  because it depends on the characteristics of the data set or the particular  application. This analysis is very important in a supervised learning problem,  because IHSK can significantly reduce the feature space of the solution. Finally, we compared the  results with two new algorithms (see table 9): A niching memetic algorithm for  simultaneous clustering and feature selection (called NMA_CFS) and an algorithm  for feature selection wrapped around the K-Means algorithm (called  FS-K-Means_BIC), both of them proposed in [15]. These two algorithms do FS and  find the number of clusters, so results are not totally comparable, but it is  nevertheless a good way of fixing a goal for IHSK in a new version. The goal is  close to current results in all data sets, but it is necessary to consider  including a noise removal procedure in IHSK or use other metrics to compare  different cluster solutions with different features selected [3].</font></p>      <p align="center"><img src="/img/revistas/rfiua/n55/n55a16i02.gif"><a name="figura2"></a></p>      <p><font face="Verdana" size="2"><b>Figure 2</b> ECP and NFS by IHSK with different values of FI parameter</font></p>      ]]></body>
<body><![CDATA[<p><font face="Verdana" size="2"><b>Table 9</b> ECP and NFS by the three algorithms</font></p>     <p align="center"><img src="/img/revistas/rfiua/n55/n55a16t09.gif"><a name="tabla9"></a></p>       <p><font face="Verdana" size="3"><b>Conclusions and future work</b></font></p>     <p><font face="Verdana" size="2"> We have designed and implemented the IHSK algorithm. IHSK is a wrapper clustering algorithm with a random search strategy, but IHSK can also be used in classification tasks. The improvement of HS and the K-means algorithm with a feature selection process shows promising experimental results. The combination of feature variance, FI parameter and Trace (S<sub>w</sub><sup>-1</sup>S<sub>b</sub>) shows a new way to find relevant features in a clustering problem with a random strategy search. The overall complexity of IHSK is 0(n*K*D*(L + HMS+ Nl)*BMRS), so IHSK can be used with large data sets. Unfortunately, as with the K-means algorithm, IHSK is sensitive to noise.    <br>    <br> There are several tasks for future work; among them: apply the IHSK algorithm to real data sets with a lot of irrelevant and redundant features; include in IHSK a metric (e.g. Bayesian Information Criterion [1]) to find the number of clusters automatically; use the global-best harmony search [5] strategy or other improvements of HS; use K-medoids or Expectation Maximization algorithms instead of the K-means algorithm and compares their results; make IHSK less sensitive to noise; compare IHSK with other initialization techniques of K-means and finally, use another metric for feature selection (e.g. Trace (s<sub>w</sub><sup>-1</sup>S<sub>b</sub>) normalized using a cross projection scheme [3]).  </font></p>      <br>    <p><font face="Verdana" size="3"><b>Acknowledgments</b></font></p>     <p><font face="Verdana" size="2">The work in this paper was supported by a Research Grant from  the University of Cauca under Project VRI-2560 and the National University of  Colombia. We are especially grateful to Guillermo Arenas and Colin McLachlan  for their help in reviewing the English text.</font></p>      <br>     ]]></body>
<body><![CDATA[<p><font face="Verdana" size="3"><b>References</b></font></p>      <!-- ref --><p><font face="Verdana" size="2"> 1. A. K. Jain, M. N. Murty, P. J. Flynn. "Data clustering: a review". ACMComput. Surv. Vol. 31. 1999. pp. 264323.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000124&pid=S0120-6230201000050001600001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 2. K. Jacob, N. Charles, T. Marc. Grouping Multidimensional Data Recent Advances in Clustering. Ed. Springer-Verlag. New York. 2006. pp. 25-72.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000126&pid=S0120-6230201000050001600002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 3. J. Dy, G.C.E. Brodley, J. Mach. "Feature Selection for Unsupervised Learning". Learn. Res. Vol.5. 2004. pp. 845-889.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000128&pid=S0120-6230201000050001600003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 4. Z. Geem, J. Kim, G.V. Loganathan. "A New Heuristic Optimization Algorithm". Harmony Search Simulation. Vol.76. 2001. pp. 60-68.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000130&pid=S0120-6230201000050001600004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 5. M. G. H Omran, M. Mahdavi. "Global-best harmony search". Applied Mathematics and Computation, Vol. 198. 2008. pp. 643-656.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000132&pid=S0120-6230201000050001600005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 6. M. Mahdavi, M. Fesanghary, E. Damangir. "An improved harmony search algorithm for solving optimization problems". Applied Mathematics and Computation. Vol. 188. 2007. pp. 1567-1579.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000134&pid=S0120-6230201000050001600006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 7. S. J. Redmondand, C. Heneghan. "A method for initialising the K-means clustering algorithm using kd- trees". Pattern Recognition Letters. Vol. 28. 2007. pp. 965-973.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000136&pid=S0120-6230201000050001600007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 8. A. K Jain, R.C. Dubes. Algorithms for clustering data. Ed. Prentice-Hall Inc. Englewood Cliffs (NJ.). 1988. pp.143-222.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000138&pid=S0120-6230201000050001600008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 9. A. Webb. Statistical Pattern Recognition. 2a ed. Ed. John Wiley & Sons. Malvern (UK) 2002. pp. 361408.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000140&pid=S0120-6230201000050001600009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 10. A. L. Blum, P. Langley. "Selection of relevant features and examples in machine learning". Artificial Intelligence. Vol. 97. 1997. pp. 245-271.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000142&pid=S0120-6230201000050001600010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 11. K. Ron, H. J. George. "Wrappers for feature subset selection". Artif. Intell. Vol. 97. 1997. pp. 273-324.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000144&pid=S0120-6230201000050001600011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 12. H. Zeng, Y. M. Cheung. "A new feature selection method for Gaussian mixture clustering". Pattern Recognition. Vol. 42. 2009. pp. 243-250.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000146&pid=S0120-6230201000050001600012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 13. S. Osinski, J. Stefanowski, D. Weiss. "Lingo search results clustering algorithm based on Singular Value Decomposition". International Conference on Intelligent Information Systems (IIPWM). Zakapore (Poland). 2004. pp. 359-397.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000148&pid=S0120-6230201000050001600013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 14. J. Han, M. Kamber. Data Mining Concepts and Techniques. 2a ed. Ed.Morgan Kaufmann Publishers. 2006.pp.71-72.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000150&pid=S0120-6230201000050001600014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><br>    <!-- ref --><br> 15. S. Weiguo, L. Xiaohui, M. Fairhurst. "A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection". IEEE Transactions on Knowledge and Data Engineering. Vol. 20. 2008. pp. 868-879.  </font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000152&pid=S0120-6230201000050001600015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><p>&nbsp;</p>     <p><font face="Verdana" size="2">(Recibido el 29 de Abril de 2009. Aceptado el 6 de abril de 2010)    <br>       <br>   <sup>*</sup>Autor de correspondencia: teléfono: + 57 + 2 + 820 98 00 ext. 2119, fax: + 57 + 2 + 820 98 00 ext. 2102, correo electrónico: <a href="mailto:ccobos@ unicauca.edu.co ">ccobos@ unicauca.edu.co.</a> (C. Cobos)</font></p>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[A. K]]></given-names>
</name>
<name>
<surname><![CDATA[Murty]]></surname>
<given-names><![CDATA[M. N]]></given-names>
</name>
<name>
<surname><![CDATA[Flynn]]></surname>
<given-names><![CDATA[P. J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Data clustering: a review]]></article-title>
<source><![CDATA[ACMComput. Surv]]></source>
<year>1999</year>
<volume>31</volume>
<page-range>264323</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jacob]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[Charles]]></surname>
<given-names><![CDATA[N]]></given-names>
</name>
<name>
<surname><![CDATA[Marc]]></surname>
<given-names><![CDATA[T]]></given-names>
</name>
</person-group>
<source><![CDATA[Grouping Multidimensional Data Recent Advances in Clustering]]></source>
<year>2006</year>
<page-range>25-72</page-range><publisher-loc><![CDATA[New York ]]></publisher-loc>
<publisher-name><![CDATA[Ed. Springer-Verlag]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dy]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Brodley]]></surname>
<given-names><![CDATA[G.C.E]]></given-names>
</name>
<name>
<surname><![CDATA[Mach]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Feature Selection for Unsupervised Learning]]></article-title>
<source><![CDATA[Learn. Res]]></source>
<year>2004</year>
<volume>5</volume>
<page-range>845-889</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Geem]]></surname>
<given-names><![CDATA[Z]]></given-names>
</name>
<name>
<surname><![CDATA[Kim]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Loganathan]]></surname>
<given-names><![CDATA[G.V]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A New Heuristic Optimization Algorithm]]></article-title>
<source><![CDATA[Harmony Search Simulation]]></source>
<year>2001</year>
<volume>76</volume>
<page-range>60-68</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Omran]]></surname>
<given-names><![CDATA[M. G. H]]></given-names>
</name>
<name>
<surname><![CDATA[Mahdavi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Global-best harmony search]]></article-title>
<source><![CDATA[Applied Mathematics and Computation]]></source>
<year>2008</year>
<volume>198</volume>
<page-range>643-656</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Mahdavi]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Fesanghary]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Damangir]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[An improved harmony search algorithm for solving optimization problems]]></article-title>
<source><![CDATA[Applied Mathematics and Computation]]></source>
<year>2007</year>
<volume>188</volume>
<page-range>1567-1579</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Redmondand]]></surname>
<given-names><![CDATA[S. J]]></given-names>
</name>
<name>
<surname><![CDATA[Heneghan]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A method for initialising the K-means clustering algorithm using kd- trees]]></article-title>
<source><![CDATA[Pattern Recognition Letters]]></source>
<year>2007</year>
<volume>28</volume>
<page-range>965-973</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Jain]]></surname>
<given-names><![CDATA[A. K]]></given-names>
</name>
<name>
<surname><![CDATA[Dubes]]></surname>
<given-names><![CDATA[R.C]]></given-names>
</name>
</person-group>
<source><![CDATA[Algorithms for clustering data]]></source>
<year>1988</year>
<page-range>143-222</page-range><publisher-loc><![CDATA[Englewood Cliffs ]]></publisher-loc>
<publisher-name><![CDATA[Ed. Prentice-Hall Inc]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Webb]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Statistical Pattern Recognition]]></source>
<year>2002</year>
<edition>2</edition>
<page-range>361408</page-range><publisher-loc><![CDATA[Malvern ]]></publisher-loc>
<publisher-name><![CDATA[Ed. John Wiley & Sons]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Blum]]></surname>
<given-names><![CDATA[A. L]]></given-names>
</name>
<name>
<surname><![CDATA[Langley]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Selection of relevant features and examples in machine learning]]></article-title>
<source><![CDATA[Artificial Intelligence]]></source>
<year>1997</year>
<volume>97</volume>
<page-range>245-271</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ron]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
<name>
<surname><![CDATA[George]]></surname>
<given-names><![CDATA[H. J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Wrappers for feature subset selection]]></article-title>
<source><![CDATA[Artif. Intell]]></source>
<year>1997</year>
<volume>97</volume>
<page-range>273-324</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Zeng]]></surname>
<given-names><![CDATA[H]]></given-names>
</name>
<name>
<surname><![CDATA[Cheung]]></surname>
<given-names><![CDATA[Y. M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A new feature selection method for Gaussian mixture clustering]]></article-title>
<source><![CDATA[Pattern Recognition]]></source>
<year>2009</year>
<volume>42</volume>
<page-range>243-250</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Osinski]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Stefanowski]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Weiss]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<source><![CDATA[Lingo search results clustering algorithm based on Singular Value Decomposition]]></source>
<year></year>
<conf-name><![CDATA[ International Conference on Intelligent Information Systems (IIPWM)]]></conf-name>
<conf-date>2004</conf-date>
<conf-loc>Zakapore </conf-loc>
<page-range>359-397</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Han]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Kamber]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<source><![CDATA[Data Mining Concepts and Techniques]]></source>
<year>2006</year>
<edition>2</edition>
<page-range>71-72</page-range><publisher-name><![CDATA[Ed.Morgan Kaufmann Publishers]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Weiguo]]></surname>
<given-names><![CDATA[S]]></given-names>
</name>
<name>
<surname><![CDATA[Xiaohui]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Fairhurst]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection]]></article-title>
<source><![CDATA[IEEE Transactions on Knowledge and Data Engineering]]></source>
<year>2008</year>
<volume>20</volume>
<page-range>868-879</page-range></nlm-citation>
</ref>
</ref-list>
</back>
</article>
