SciELO - Scientific Electronic Library Online

 
vol.20 número2Gestión eficiente de energía eléctrica domiciliaria con base en los incentivos de la Ley colombiana 1715 de 2014MONO+KM: administración de conocimiento en el manejo de proyectos colaborativos índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Ingeniería y Universidad

versión impresa ISSN 0123-2126

Resumen

MOGOLLON PINZON, Christian  y  ROJAS-GALEANO, Sergio. A Web-Forum Free of Disguised Profanity by Means of Sequence Alignment. Ing. Univ. [online]. 2016, vol.20, n.2, pp.239-265. ISSN 0123-2126.  https://doi.org/10.11144/Javeriana.iyu20-2.wffd.

Profanity is the use of offensive, obscene, or abusive vocables or expressions in public conversations. A big source of conversations in text format nowadays are digital media such as forums, blogs, or social networks where malicious users are taking advantage of their ample worldwide coverage to disseminate undesired profanity aimed at insulting or denigrating opinions, names, or trademarks. Lexicon-based exact comparisons are the most common filters used to prevent such attacks in these media; however, ingenious users are disguising profanity using transliteration or masking of the original vocable while still conveying its intended semantic (e.g. by writing piss as P!55 orp.i.s.s), hence defeating the filter. Recent approaches to this problem, inspired in the sequence alignment methods from comparative genomics in bioinformatics, have shown promise in unmasking such guises. Building upon those techniques we have developed an experimental Web forum (ForumForte) where user comments are cleaned of disguised profanity. In this paper we discuss briefly the techniques and main engineering artefacts obtained during the developing of the software. Empirical evidence reveals filtering effectiveness between 84% and 97% at vocable level depending on the length of the profanity (with more than four letters), and 86% at sentence level when tested in two sets of real user-generated-comments written in Spanish and Portuguese. These results suggest the suitability of the software as a language-independent tool.

Palabras clave : web forum; profanity detection; text analysis.

        · resumen en Español     · texto en Inglés     · Inglés ( pdf )

 

Creative Commons License Todo el contenido de esta revista, excepto dónde está identificado, está bajo una Licencia Creative Commons