SciELO - Scientific Electronic Library Online

 
vol.11 número20PoV-GAME: PUNTOS DE VISTA MEDIANTE JUEGOSDISEÑO DE UN SISTEMA DIFUSO PARA VALORACIÓN DE APORTES EN SISTEMAS COLABORATIVOS índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Revista Ingenierías Universidad de Medellín

versión impresa ISSN 1692-3324versión On-line ISSN 2248-4094

Resumen

AMON, Iván; MORENO, Francisco  y  ECHEVERRI, Jaime. PHONETIC ALGORITHM TO DETECT DUPLICATE TEXT STRINGS IN SPANISH. Rev. ing. univ. Medellín [online]. 2012, vol.11, n.20, pp.127-138. ISSN 1692-3324.

Often data that should be written so they are not identical due to misspellings and typos, variations in word order, use of prefixes and suffixes, among others. Phonetic techniques for duplicate detection are not geared toward the Spanish language, which makes the identification and correction of problems such as spelling errors in texts written in this language. In this paper we propose an algorithm called PhoneticSpanish to detect duplicate text strings which considers the presence of spelling errors in Spanish. The proposed algorithm was compared with nine techniques to detect duplicates. The results were satisfactory and the algorithm that performed better than the other techniques and demonstrate opportunities for improved analysis of information in Spanish.

Palabras clave : Data cleansing; data quality; detection of duplicates; similarity functions; phonetic algorithms.

        · resumen en Español     · texto en Español     · Español ( pdf )

 

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons