<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>1794-9165</journal-id>
<journal-title><![CDATA[Ingeniería y Ciencia]]></journal-title>
<abbrev-journal-title><![CDATA[ing.cienc.]]></abbrev-journal-title>
<issn>1794-9165</issn>
<publisher>
<publisher-name><![CDATA[Escuela de Ciencias y Humanidades y Escuela de Ingeniería de la Universidad EAFIT]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S1794-91652012000200008</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Detection of Fraudulent Transactions Through a Generalized Mixed Linear Models]]></article-title>
<article-title xml:lang="es"><![CDATA[Detección de transacciones fraudulentas a través de un Modelo Lineal Mixto Generalizado]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Gómez-Restrepo]]></surname>
<given-names><![CDATA[Jackelyne]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Cogollo-Flórez]]></surname>
<given-names><![CDATA[Myladis R]]></given-names>
</name>
<xref ref-type="aff" rid="A02"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universidad EAFIT  ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<aff id="A02">
<institution><![CDATA[,Universidad EAFIT  ]]></institution>
<addr-line><![CDATA[Medellín ]]></addr-line>
<country>Colombia</country>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>07</month>
<year>2012</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>07</month>
<year>2012</year>
</pub-date>
<volume>8</volume>
<numero>16</numero>
<fpage>221</fpage>
<lpage>237</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_arttext&amp;pid=S1794-91652012000200008&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_abstract&amp;pid=S1794-91652012000200008&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://www.scielo.org.co/scielo.php?script=sci_pdf&amp;pid=S1794-91652012000200008&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[The detection of bank frauds is a topic which many financial sector companies have invested time and resources into. However, finding patterns in the methodologies used to commit fraud in banks is a job that primarily involves intimate knowledge of customer behavior, with the idea of isolating those transactions which do not correspond to what the client usually does. Thus, the solutions proposed in literature tend to focus on identifying outliers or groups, but fail to analyse each client or forecast fraud. This paper evaluates the implementation of a generalized linear model to detect fraud. With this model, unlike conventional methods, we consider the heterogeneity of customers. We not only generate a global model, but also a model for each customer which describes the behavior of each one according to their transactional history and previously detected fraudulent transactions. In particular, a mixed logistic model is used to estimate the probability that a transaction is fraudulent, using information that has been taken by the banking systems in different moments of time.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[La detección de fraudes ha sido uno de los temas en el que muchas compañías del sector financiero han invertido más tiempo y recursos con el fin de mitigarlo y de esta forma mantenerse a salvo; sin embargo, encontrar patrones dentro de las metodologías empleadas para cometer fraude en entidades bancarias es un trabajo que involucra ante todo conocer muy bien el comportamiento del individuo, con la idea de finalmente hallar dentro de todas sus transacciones aquellas que no corresponderían a lo que habitualmente éste hace. De esta forma, las soluciones planteadas hasta la fecha, para este problema se han trasladado únicamente a poder identificar outliers o datos atípicos dentro de la muestra que se está analizando, lo cual no permite analizar cada individuo de manera individual y mucho menos realizar un pronóstico de fraudes. En este trabajo se evalúa el uso de un modelo logístico mixto para la detección de fraudes. Este modelo, a diferencia de los métodos convencionales para detección de fraudes, considera la variabilidad de las transacciones realizadas por cada individuo; lo que permite generar no sólo un modelo global, sino también un modelo por cada individuo que permite estimar la probabilidad de que una transacción realizada sea fraudulenta, teniendo en cuenta su historial de transacciones y las transacciones fraudulentas detectadas previamente.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Generalized linear model]]></kwd>
<kwd lng="en"><![CDATA[transactional history]]></kwd>
<kwd lng="en"><![CDATA[detected frauds]]></kwd>
<kwd lng="en"><![CDATA[outliers detection]]></kwd>
<kwd lng="es"><![CDATA[Modelo lineal generalizado]]></kwd>
<kwd lng="es"><![CDATA[historia transaccional]]></kwd>
<kwd lng="es"><![CDATA[fraudes detectados]]></kwd>
<kwd lng="es"><![CDATA[detección de outliers]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[  <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p align="right">ART&Iacute;CULO ORIGINAL</p>     <p align="center">&nbsp;</p>     <p align="center"><b><font size="4">Detection of Fraudulent Transactions   Through a Generalized Mixed Linear Models</font></b></p>     <p>&nbsp;</p>     <p align="center"><b><font size="3">Detecci&oacute;n de transacciones fraudulentas a trav&eacute;s de un   Modelo Lineal Mixto Generalizado</font></b></p>     <p>&nbsp;</p>     <p>&nbsp;</p>     <p><b>Jackelyne G&oacute;mez-Restrepo<sup>1</sup> y   Myladis R. Cogollo-Fl&oacute;rez<sup>2</sup></b></p>     <p><sup>1</sup> Mathematical Engineer, <a href="mailto:jgomezr7@eafit.edu.co">jgomezr7@eafit.edu.co</a>, Masters student, Universidad EAFIT Medell&iacute;n-Colombia.</p>     ]]></body>
<body><![CDATA[<p>   <sup>2</sup> Master in Science in Statistics, Ph.D.(C) Systems and computer Engineering, <a href="mailto:mcogollo@eafit.edu.co">mcogollo@eafit.edu.co</a>, professor, Universidad EAFIT, Medell&iacute;n-Colombia.</p>     <p>&nbsp;  </p>     <p>Received:17-feb-2012, Acepted: 18-oct-2012Available online: 30-nov-2012</p>     <p>MSC:62p05 </p>     <p>&nbsp;</p> </font> <hr size="1" /> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p><b>Abstract</b></p>     <p>   The detection of bank frauds is a topic which many financial sector companies   have invested time and resources into. However, finding patterns in   the methodologies used to commit fraud in banks is a job that primarily involves   intimate knowledge of customer behavior, with the idea of isolating   those transactions which do not correspond to what the client usually does.   Thus, the solutions proposed in literature tend to focus on identifying outliers   or groups, but fail to analyse each client or forecast fraud. This paper   evaluates the implementation of a generalized linear model to detect fraud.   With this model, unlike conventional methods, we consider the heterogeneity   of customers. We not only generate a global model, but also a model for each   customer which describes the behavior of each one according to their transactional   history and previously detected fraudulent transactions. In particular,   a mixed logistic model is used to estimate the probability that a transaction   is fraudulent, using information that has been taken by the banking systems   in different moments of time.</p>     <p><b>Key words:</b> Generalized linear model, transactional history, detected frauds, outliers detection.</p> </font> <hr size="1" /> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p><b>Resumen</b></p>     <p>La detecci&oacute;n de fraudes ha sido uno de los temas en el que muchas compa&ntilde;&iacute;as   del sector financiero han invertido m&aacute;s tiempo y recursos con el fin de mitigarlo   y de esta forma mantenerse a salvo; sin embargo, encontrar patrones dentro   de las metodolog&iacute;as empleadas para cometer fraude en entidades bancarias es   un trabajo que involucra ante todo conocer muy bien el comportamiento del   individuo, con la idea de finalmente hallar dentro de todas sus transacciones   aquellas que no corresponder&iacute;an a lo que habitualmente &eacute;ste hace. De esta   forma, las soluciones planteadas hasta la fecha, para este problema se han   trasladado &uacute;nicamente a poder identificar outliers o datos at&iacute;picos dentro de   la muestra que se est&aacute; analizando, lo cual no permite analizar cada individuo   de manera individual y mucho menos realizar un pron&oacute;stico de fraudes.   En este trabajo se eval&uacute;a el uso de un modelo log&iacute;stico mixto para la detecci&oacute;n   de fraudes. Este modelo, a diferencia de los m&eacute;todos convencionales para detecci&oacute;n   de fraudes, considera la variabilidad de las transacciones realizadas por   cada individuo; lo que permite generar no s&oacute;lo un modelo global, sino tambi&eacute;n   un modelo por cada individuo que permite estimar la probabilidad de que   una transacci&oacute;n realizada sea fraudulenta, teniendo en cuenta su historial de   transacciones y las transacciones fraudulentas detectadas previamente.</p>     ]]></body>
<body><![CDATA[<p><b>Palabras claves:</b> Modelo lineal generalizado, historia transaccional, fraudes   detectados, detecci&oacute;n de outliers.</p> </font> <hr size="1" /> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p>&nbsp;</p>     <p><b><font size="3">1 Introduction</font></b></p>     <p>   Among the methodologies used for detecting fraud through magnetic strip   cards, are those used to detect patterns or anomalies, that determine a fraudulent   action as an event which is not consistent with others, in this way it   takes using data mining tools which use statistics science, optimization and   large volumes of information. &#91;1&#93;, since 1997 to 2008, perform a review of   the state of art about applications of data mining in financial fraud detection.   They find that most common data mining techniques applied to detect fraud are methods of classification &#91;2&#93;,&#91;3&#93;,&#91;4&#93;,&#91;5&#93; and clustering &#91;6&#93;,&#91;7&#93;,&#91;8&#93;. In   &#91;9&#93;,&#91;10&#93; the authors review the statistical techniques used for detecting fraud.   Specifically, the most used methodology, for fraud detection through magnetic   stripe cards, is linear discriminant analysis. Similarly, artificial neural   networks (ANN) are used for forecasting this kind of behaviour; in &#91;7&#93; propose   an unsupervised neural network for detecting and creating criteria to   identify suspicious individual behaviours, using trends and characteristics of   individuals. Meanwhile, in &#91;11&#93; proposed a supervised network, using 3 hidden   layers and back-propagation algorithm to determine patterns of fraud. In &#91;12&#93;   makes a comparative research among an ANN, decision trees and Bayesian   networks. With decision trees, branches could gather almost every abnormal   movement, but this kind of model requires an initial analysis of the variables   to determine whether or not independent. The method that worked better was   ANN, followed by the decision tree, and finally Bayesian network. Besides,   about the variables that should be used for detecting fraud, &#91;13&#93; proposed a   detailed research for choosing correctly variables and methodology, they suggest   using amount, type (payment, check, etc.), type of market in which it   was used, channel and check mode (PIN or chip or magnetic stripe). Also they   proposed to use aggregated information from each individual in order to have   all history available, and thereby make predictions of the behaviour of each   person, and when a transaction gets an abnormal patter it will be considered   as an alert to analyse. In many cases, be an expert minimizes the work to   select a methodology and leads to create hard rules that not determine all   abnormal movements, but mostly of them; in &#91;14&#93; are applied different rules   for gain knowledge of patterns of individual transactions. However, as mentioned,   this methodology involves having a vast knowledge of the individual   and the system, as they must create rules based on the history to create implications   that would be used as a criterion for determining whether conduct   is suspected or not (fuzzy logic).</p>     <p>Generally, in the literature there are proposals made for fraud detection through   magnetic stripe cards, which are based on classification and clustering techniques   or ANN, in which individuals are classified according to general rules.   These techniques assume that individuals have a similar variability, a common   pattern, and they do not examine individual variability for each client in their   financial transactions. This is a disadvantage and may lead to problems in the quality of detection because not all individuals operate equal; in real life each individual has a unique behaviour that should be studied as such. This paper proposes the use of a mixed logistic model to determine suspicious transactions through transactional information of individuals. As well as estimating fraud within the organization, the model determines a model for each client, taking into account individual behaviour.</p>     <p>This paper is divided into five sections, first one describes an overview about   theory of linear mixed models, the second part has information about theory   of generalized linear mixed models, the mixed logistic model is considered as   a particular case, third section presents the use of mixed logistic model to detect fraud, and finally conclusions and references are presented.</p>     <p>&nbsp;</p>     <p><b><font size="3">2 General Linear Model with Mixed Effects</font></b></p>     <p>   Linear mixed models have been increasing their popularity in applied statistics   literature for health sciences, because they represent a powerful tool to   analyse data with repeated measures, frequently obtained in studies of this   area. The existence of repeated measurements requires special attention to   the characterization of random variation in data. In particular, it is important   to explicitly recognize two levels of variability: random variation between   measures within a particular individual (intra-individual variation) and random   variation between individuals (inter-individual variation). The linear   mixed model considers these sources of variation and can be defined by the   following two steps:</p>     <blockquote>    ]]></body>
<body><![CDATA[<p><b>Step 1</b>: Modelling intra-individual variation.   Suppose that for the <i>i-th</i> of <i>m</i> individuals, ni responses have been observed   and that a total of <i>N</i> =   <img src="/img/revistas/ince/v8n16/v8n16a08g1.jpg" /><i>n<sub>i</sub></i> data are available. Let be <i><b>y</b><sub>i</sub></i> the vector of responses for the individual <i>i-th</i>, which satisfies </p> </blockquote>     <p align="right"><img src="/img/revistas/ince/v8n16/v8n16a08g2.jpg" /></p> </font>    <p>       <blockquote><font size="2" face="Verdana, Arial, Helvetica, sans-serif">where <b><i>&beta;</i></b> is a vector of parameters (p x 1) that corresponds to the fixed     effects, <i><b>x</b><sub>i</sub></i> is a matrix for the <i>i-th</i> individual which characterizes the     systematic part of the answer; <i><b>&alpha;</b><sub>i</sub></i> is a vector (<i>k</i> x 1) characteristic of     the <i>i-th</i> individual, <b><i>z</i></b><sub><i>i</i></sub> is a design matrix (<i>n<sub>i</sub></i> x <i>k</i>) and <i>e<sub>i</sub></i> is the vector     of intra-individual errors. Assumes that <i><b>e</b><sub>i</sub></i> <img src="/img/revistas/ince/v8n16/v8n16a08i14.jpg" /> <i>N<sub>ni</sub></i>(0,<i><b>R</b><sub>i</sub></i>), where <i><b>R</b><sub>i</sub></i> is a covariance matrix intra-individual of size (<i>n<sub>i</sub></i> x <i>n<sub>i</sub></i>). So, from model (1):</font></blockquote> </p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g3.jpg" /></p>     <p>&nbsp;</p> </font>     <blockquote>       <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif"><b>Step 2:</b> Modelling inter-individual variation.     </font></p>       <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Suppose that the vector of random effects <b>&alpha;</b><sub><i>i</i></sub> is obtained from a normal     distribution with mean zero and dispersion matrix <b><i>D</i></b>(<i><sub>k x k</sub></i>); besides assume     that <b>&alpha;</b><sub><i>i</i></sub>; <i>i</i> = 1, ... ,m, are mutually independent. So, under these     assumptions:</font></p>       <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g4.jpg" /></p> </blockquote> <font face="Verdana, Arial, Helvetica, sans-serif" size="2"></font>     ]]></body>
<body><![CDATA[<p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">       <blockquote> That is, the model (1) with the above assumptions for <b><i>e</i></b><i><sub>i</sub></i> and      <b>&alpha;</b><i><sub>i</sub></i> implies   that <b><i>y</i></b><i><sub>i</sub></i> is a multivariate normal random vector of dimension <i>n<sub>i</sub></i> with a   particular form of covariance matrix, it means: <b><i>y</i></b><i><sub>i </sub></i> <img src="/img/revistas/ince/v8n16/v8n16a08i14.jpg" /> <i> N<sub>ni</sub></i>(<i><b>x</b><sub>i</sub></i>,<i><b>&beta;</b></i>, <i><b>V</b><sub>i</sub></i>). The   shape of <b><i>V</i></b><i><sub>i</sub></i> implies that the model has two different components of variability,   the first one refers only to the variation within individuals (<b><i>R</i></b><i><sub>i</sub></i>)   and the second one refers to the variation between individuals (<b><i>D</i></b>).   In the adjustment process of a mixed model is common to consider   three components: the estimation of fixed effects (<i><b>&beta;</b></i>), the estimation   of random effects (<i><b>&alpha;</b><sub>i</sub></i>) and the estimation of covariance parameters (<b><i>D</i></b>  y <b><i>R</i></b><i><sub>i</sub></i>)&#91;15&#93;. The standard approach under the multivariate normality   assumption is to use the method of maximum likelihood (ML) and restricted   maximum likelihood (REML). Although Bayesian concepts are   also used to estimate <i><b>&alpha;</b></i><sub>i</sub>.</blockquote> </font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>The next section will present a brief introduction to generalized linear mixed   models, which are extensions of models above but now the response variable is not continuous.</p>     <p>&nbsp;</p>     <p><b><font size="3">3 Generalized Linear Mixed Model</font></b></p> </font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Generally, linear mixed models have been used in situations where the response   variable is continuous. However, in practice there are cases where the response may be a discrete variable or categorical; for example, the number of heart   attacks in a potential patient during the last year takes values as 0, 1, 2, ...   In these cases, Generalized linear mixed models (GLMM) are used, a GLMM   is an extension of the linear mixed model where responses are correlated and   can be categorical or discrete variables &#91;16&#93;. To define a GLMM, two stages   need to be mentioned:</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>       <blockquote>Stage 1: Select a random sample of <i>n</i> individuals from a population of size <i>N</i>.     Attach to the <i>i-th</i> individual an specific parameter <i>&alpha;</i>i.</blockquote> </p> </font>    <p>       <blockquote><font size="2" face="Verdana, Arial, Helvetica, sans-serif">     Stage 2: According with &alpha;i, select repetitions of &#91;<i>y<sub>ij</sub>,</i> <i> x<sub>ij</sub></i>&#93;;; <i> i</i> = 1, ..., <i>n</i>,<i>j</i> =     1, ... , <i>n<sub>i</sub></i>. Suppose that per individual, <i>y<sub>i</sub>l</i><i>&alpha;</i>i the repetitions are statistically     independent, such that:</font></blockquote>   </p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    ]]></body>
<body><![CDATA[<p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g5.jpg" /></p> </font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">       <blockquote>Where <i>b</i>, <i>a</i>, <i>c</i> are known functions, and <i>&psi;</i> is the dispersion parameter     which may be or may be not known. &xi;<i><sub>i</sub></i> is associated with &micro;<i><sub>i</sub></i> = <i>E</i>(<i><b>y</b><sub>i</sub></i>l<i>&alpha;<sub>i</sub></i>), which is associated with the linear predictor: &eta;<i><sub>i</sub></i> = <i>&alpha;<sub>i</sub><b>z</b></i><b>'</b><sub>i</sub> + <i><b>x</b><sub>i</sub> </i><b>&beta;</b> through     a link function <i>g</i>(.), such that <i>g</i>(&micro;<i><sub>i</sub></i>) = &eta;<i><sub>i</sub></i>. For this case, <b><i>z</i></b><sub>i</sub> are registered     variables that represent a random effect for the <i>i-th</i> individual.     Models as mixed logistic model, mixed Poisson model, Probit model and     other can be obtained with different link functions.</blockquote> </font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">     <p>The methodology which was proposed is based on a mixed logistic model; the model was obtained with a sampling scheme of two stage, where <i>y<sub>i</sub></i>l<i>&alpha;<sub>i</sub></i><img src="/img/revistas/ince/v8n16/v8n16a08i14.jpg" />  <i>Ber</i>(<i>p<sub>i</sub></i>) is assumed i.i.d., and with <i>p<sub>i</sub></i> = <i>P</i>(<i>y<sub>ij</sub></i> = 1l<i>&alpha;<sub>i</sub></i>). Also, the link function   is a logit:</p>     <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g6.jpg" /></p>     <p>A logistic model with random intercept is obtained if <i>z<sub>ij</sub></i> = 1.</p>     <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g7.jpg" /></p> </font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Where &alpha;<sub>1</sub>, ... , &alpha;<i><sub>n</sub></i> are i.i.d., such that &alpha;<sub>i</sub> <img src="/img/revistas/ince/v8n16/v8n16a08i14.jpg" /> h<sub>&alpha;</sub>(&theta;). </font></p>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Note that this type of model can be extended to case where <i>y<sub>i</sub></i>l&alpha;<sub>i</sub> <img src="/img/revistas/ince/v8n16/v8n16a08i14.jpg" /> <i>Multinomial</i>(<i>p</i><sub>1</sub>,..., <i>p<sub>k</sub></i>). So, the model can predict the likelihood that   a subject belongs to one of the k groups. The model's predictive ability is   assessed by comparing the observed data and the predicted data; the model   classifies individuals in each group defined by the dependent variable based   on a cut off point set for the predicted probabilities from the estimated coefficients   and the value taken for each explanatory variable. &#91;17&#93;.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>The interpretation of coefficients and the criteria of goodness of fit are:</p>     ]]></body>
<body><![CDATA[<blockquote>    <p> i. Theoretical value and interpretation of coefficients.</p>     <p>   The shape of the theoretical value in a logit regression is similar to values   in a multiple regression, and it represents a unique relationship with   coefficients which indicate the relative weight of each predictor. The   calculation of a logit coefficient compares the probability of occurrence   of an event with the probability that it does not happen. &beta; are measures   of changes in the odds ratio &#91;18&#93;. In some cases, the coefficients   are logarithmic values, so they should be transformed to do a correct   interpretation of them; taking into account that a positive coefficient   increases the probability of occurrence while a negative has an opposite   effect.</p>     <p>ii. Model evaluation.</p>     <p>   Logit models with random intercept, unlike linear models, are not assessed   with the <i>R</i><sup>2</sup> or through the AIC coefficient, because the methods   for calculating them require a high complexity, computation time and   perhaps, in many cases, the methods cannot converge. So that, rates   and indicators are used to get an idea of model behaviour:</p>     <p>   Misclassification rate: Refers to the probability of classifying a 0 as 1 or   vice versa.</p>     <p>   Good classification rate: Refers to the probability of classifying a 1 as 1   or vice versa.</p>     <p>   Specificity: Refers to the probability of classifying a 1 as 1 given that it   is 1.</p>     <p>   Sensitivity: Refers to the probability of classifying a 0 as 0 given that it   is 0.</p></blockquote>     <p>&nbsp;</p>     ]]></body>
<body><![CDATA[<p><b><font size="3">4 GLMM for detecting fraud transactions</font></b></p>     <p>   According to the literature, the use of classification and clustering techniques   have been proposed for the detection of fraud through swipe cards &#91;2&#93;,&#91;3&#93;,&#91;4&#93;,&#91;5&#93;,   &#91;6&#93;,&#91;7&#93;,&#91;8&#93;,&#91;9&#93;,&#91;10&#93;; but, these techniques just create a classification rule assuming   that all individuals have an average behaviour, so that they cannot to   estimate (on-line) the probability that a transaction is fraudulent. Also, their   theoretical development is built under the assumption that there is only one   observation for each client, so these techniques are not available to read repeated   measures (number of observations) of each individual.   In practice, it is known that individuals perform several transactions and that   not all clients have the same pattern of behaviour; due to that, it is interesting   to apply other techniques that forecast the probability that a transaction is   fraudulent, and also consider each client as an entity whose variability between   his/her transactions defines an unique profile. One of the statistical   techniques designed to measure this, is the mixed logistic model.   In this section, a mixed logistic model is performed using real data, with the   intention of showing the feasibility of this kind of model and its benefits (in   terms of model quality). As well, there is a comparison between the results   obtained and a conventional detection technique.</p>     <p><b>4.1 Sample</b></p>     <p>   The methodology of fraud detection through magnetic stripe which is proposed   in this paper is based on a logistic mixed model with random intercept.   The data are storing into a file that consolidates daily national transactions   of clients, taking only those that correspond to payments through two selected   channels. With this information it is possible to identify the type of   transaction, the date, time and place where it was made. Additional to this   transactions file, there is a file with fraud detected transactions (which will be   used to construct the variable Marca) conducted through these channels, these   transactions have been detected and confirmed by the clients, thus facilitating   the process for building a supervised model such as the logistic model with   random intercept; besides, the volume of transactions is sufficient information   to develop a model per individual.</p>     <p><b>4.2 Preliminary Analisys</b></p>      <p>Due to the different measurement scales and magnitude of the values displayed   by the variables that are going to be used as regressors, it was necessary   to perform a transformation of them (like creating categories and transformations   through logarithmic functions) in order to have them at the same level and thus improving the fit of the models.</p>     <p>Subsequently, because the logistic model with random intercept assumes that   the observations are independent, a test of runs was implemented; the results   obtained are shown in <a href="#t1">Table 1</a>. Using the mean as a measure for calculating   the runs, most of the amounts of the individual transactions were categorized as independent (random). In particular, for the client 1 were obtained 8 runs.</p>     <p align="center"><a name="t1"></a><img src="/img/revistas/ince/v8n16/v8n16a08t1.jpg" /></p>     <p><b>4.3 Selection of variables</b></p>     <p>   In the database there are two groups: that one where there are clients who   were victims of fraud during a defined time period, and another one where   there are persons who have not detected any fraudulent transactions. Using   these groups, the response variable is defined as <i>y</i>: <i>Marca (fraudulent   transaction)</i>. Note that the observed variable is Bernoulli type. The possible   independent variables in the model initially are: identification number (ID),   type of ID, month, day and time when the transaction occurred, the device   used for made the transaction, the channel used for the transaction (channel   1 or channel 2), the name and location of the device, the type of transaction   (withdrawal, payment or transfer), the result of the transaction (successful   or not ), the transaction amount (amount), type of individual (individual or   business), date when individual is linked to the organization, monthly income   and expenses of individuals.</p>     ]]></body>
<body><![CDATA[<p>Subsequently, the correlation coefficients between these variables were examined.   According to the results, there is a significant correlation between the   ID, type of ID and date when the individual got linked to the organization,   this relation is expected because there are people of all ages who are linked to   the bank after obtained their majority; it is also possible to find individuals, business or foreign persons, leading to different types of Nit.</p> </font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">Additionally, the month of the transaction is related to the amount of the   transaction (<i>r</i> = 0.8, <i>p</i> - <i>value</i> &lt; 0.05), and to the existence of fraud (<i>r</i> =   0.862, <i>p</i> - <i>value</i> &lt; 0.05); this is explained by the fact that there are months   in which people must make more payments than others (start year, end of   year). In addition, according to information provided by the experts, there are months where fraud is most evident.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>Variables as day and hour of the transaction are correlated with the rest of the   variables; however the relation is not statistically significant. The relation is   produced because there are certain days of the month with most probability   to have transactions and obviously there are individuals who manage sums higher than average people (or vice versa).</p> </font>    <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The transaction amount is related to the existence of fraud (<i>r</i> = 0.82, <i>p</i> -     <i>value</i> &lt; 0:05), it is understood because there is fraud just if people withdraw   money, which is equal to say that the transaction amount is greater than 0.   Similarly, this variable depends on the transaction code (<i>r</i> = 0.9, <i>p</i> - <i>value</i> &lt; 0:05) (which is linked directly to the device), so the existence of fraud is related to the device, although it can be due to that most of detected fraud have been made through ATMs.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2"></font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The value of incoming and outgoing, are related to the amounts of the transactions   (people do not spend more money than they have in their savings   accounts), and the day (in some days there are more transactions).   Moreover, when considering the variable Marca, there is a relationship between   it and the month of the transaction. It has significant correlation with   variables such as day of the transaction (<i>r</i> = 0.93, i - value &lt; 0:05), hour   of transaction (<i>r</i> = -0:96, <i>p</i> - <i>value</i> &lt; 0:05), the transaction code and the transaction amount.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>Finally, the explanatory variables, related to the absence or presence of fraud,   are: channel, device code, transaction amount, month, day and time of the transaction.</p>     <p>However, given that fraud can be categorized using variables such as the type   of device and therefore the type of transaction, they are not going to be   eliminated (represented by transaction code). While variables like: Total   Debts, Nit type, incoming, outgoing and document type are discarded to fit the model.</p>     <p><b>4.4 Model</b></p> </font>    <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">The proposed model considers a random intercept per individual which represents   the variability of each of them. This intercept is assumed to be a   random variable distributed <i>N</i>(0, &sigma;<sup>2</sup>), so estimating it, generates a fraud detection   model per person due to the model is taking into account the variability   of each transactions per individual. However, every estimated models have in   common the coefficients associated with the rest of explanatory variables.   In this model, the value of the coefficient indicates the importance of its associated   variable, for this reason, the classifications of transactions will be   calculated using the weight (coefficient) associated to each variable; a negative   value decreases the probability of a fraud (Marca=1), while a positive value means that the probability increases &#91;19&#93;.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>It was adjusted using the GLMM package of R, with a database which contains   799 fraud detected transactions and 4854 transactions in a defined period of time. The results are shown in <a href="#t2">Table 2</a>.</p>     ]]></body>
<body><![CDATA[<p align="center"><a name="t2"></a><img src="/img/revistas/ince/v8n16/v8n16a08t2.jpg" /></p>     <p>To estimate the model, some transformed variables were used to have   consistency between the weights of each variable (coefficients); but according   with the results, the variable related to the day when the transaction was   made, is not significant, however, intuitively, this variable does not have multicolineality   with others because its standard deviation is not very large; so,   under supervision, this variable will be used for estimating the model.   On the other hand, the month and the amount of the transaction are the variables   that contribute most to the weight; however the month is the variable   that most decreases the probability, while the amount increases the probability.   The day and the code of the transaction have the lesser weight, so they   contribute less (in a negative way.) In general terms, the model fitted to the i-th individual is</p>     <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08g8.jpg" /></p>     <p>To specify the model for the individual i, the estimation of its intercept i   must be considered. In the results, it is observed that for any individuals   the month of the transaction, date, time and code transaction contribute   negatively to the likelihood that the transaction be suspicious, but as increases   the amount of the transaction , the likelihood also increases. <a href="#t3">Table 3</a> shows the estimated probabilities of fraud for some individuals in the database.</p> <a name="t3"></a>    <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08t3.jpg" /></p>     <p><b>4.5 Model Evaluation</b></p>     <p>As indicated by &#91;1&#93;, when a fraud detection model is estimated, it is necessary   to take into account the sensitivity and specificity of its classification.</p>     <p>From the information shown in <a href="#t4">Table 4</a>, specificity, sensitivity, bad classification   rate (bcr) and good classification rate (gcr) were calculate: <i>bcr=0.009371   gcr=0.9906 sensitivity=0.9922 specificity=0.9812</i> According to the literature   it is preferable to have a lower misclassification rate because it indicates that   the model has few mistakes, while the value of the rate of good classification   (gcr) close to 1 is preferred. The sensitivity measures how good are the   classifications of the model with the true positives, and specificity measures   how good are the classifications of the model with the true negatives. The   estimated model with random intercept has very good results, but it is clear that frauds are subject to verification.</p>     <p align="center"><img src="/img/revistas/ince/v8n16/v8n16a08t4.jpg" /></p>     <p>According to tests conducted with other mixed logistic models with different   combinations of variables, the model with better results in terms of gcr, bcr, sensitivity and specificity was the proposed one in this section.</p>     ]]></body>
<body><![CDATA[<p><b>4.6 Comparison with an Artificial Neural Network</b></p> </font>     <p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">   In order to evaluate the performance of a mixed logistic model in comparison   with the traditional techniques for detecting fraud, an Artificial Neural   Network (ANN) was implemented using the variables which were utilized for   estimating de mixed logistic model; also, different network architectures were   applied. The ANN was selected because is the conventional method that has   shown better results &#91;see &#91;12&#93;&#93;.In this case it was found that the ANN did not   perform as well as the mixed logistic model. The best ANN was trained using   variables like Day T, Hour T and Amount T, its rates were: <i>gcr</i> = 0.8587,   <i>bcr</i> = 0.1515, sensitivity= 0.8048 and specificity= 0.8959; while rates for the ANN that was trained with the same variables using in the fitted mixed logistic   regression were: <i>gcr</i> = 0.8233, <i>bcr</i> = 0.1868, sensitivity= 0.7594 and   specificity= 0.8714. The weak results, obtained from the ANN, can be related   to the methodology used in the ANN because it estimated a general model   for individuals and it was not possible to obtain a model per person as it does   the logistic mixed model, that considers the different behaviours of clients.</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <p>&nbsp;</p>     <p><b><font size="3">5 Conclusions</font></b></p>     <p>According to the correlation analysis between variables, the found relationships   coincide with the information provided by the experts. In this way,   arguably that fraud is stationary, so it has to be analysed taking into account   the month, the day and the hour of the transactions. Besides, type, amount   and channel of the transaction should be used for fitting the model and for determining patterns for different types of fraud.</p>     <p>The generalized linear mixed model generates favourable results; however, it   is necessary running the model for each individual as it is built with random   intercepts unique per person. This, though computationally can be a   disadvantage, within models is an advantage, as it would have a single representation   to describe the variability of each individual. But, a high volume of   historical information is required to build a profile per individual and estimate more precise models with a high quality outputs.</p>     <p>As future work, it is proposed estimating the model to groups of individuals   more susceptible according to its characteristics (females, old people); on the   other hand, it is also possible to fit a more complex model, involving variables   such as type of fraud, other kind of transactions (not only financial), among others.</p>     <p>&nbsp;</p>     <p><b><font size="3">References</font></b></p>     <!-- ref --><p>   1. E. Ngai, Y. Hu, Y. Wong, Y. Chen, X. Sun, ''The application of data mining techniques   in financial fraud detection: A classification framework and an academic   review of literature'', <i>Decision Support Systems</i>, vol. 50, n.o 3, pp. 559-569, feb.   2011.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000112&pid=S1794-9165201200020000800001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 222, 234</p>     <!-- ref --><p>2. P. Chan, W. Fan, Andreas, A. Prodromidis,  S. J. Stolfo, ''Distributed Data   Mining in Credit Card Fraud Detection'', <i>IEEE Intelligent Systems</i>, vol. 14, n.o 6, pp. 67-74, 1999.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000114&pid=S1794-9165201200020000800002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p>     <!-- ref --><p>   3. J. Dorronsoro, F. Ginel, C. Sanchez, C. Cruz, ''Neural fraud detection in credit   card operations'',<i> IEEE Transactions on Neural Networks</i>, vol. 8, n.o 4, pp. 827   -834, jul. 1997.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000116&pid=S1794-9165201200020000800003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p>     <!-- ref --><p>   4. I-C. Yeh, C. Lien, ''The comparisons of data mining techniques for the predictive   accuracy of probability of default of credit card clients'', <i>Expert Syst. Appl</i>., vol.   36, n.o 2, pp. 2473-2480, mar. 2009.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000118&pid=S1794-9165201200020000800004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p>     <!-- ref --><p>   5. T-S. C. Rong-Chang Chen, ''A new binary support vector system for increasing   detection rate of credit card fraud.'', <i>International Journal of Pattern Recognition   and Artificial Intelligence (IJPRAI)</i>, vol. 20, n.o 2, pp. 227-239, 2006.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000120&pid=S1794-9165201200020000800005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref -->   Referenced in 223, 228</p>     <!-- ref --><p>   6. Abhinav Srivastava, Amlan Kundu, Shamik Sural, Arun K. Majumdar. Credit   card fraud detection using hidden Markov model. <i>IEEE Transactions on Dependable   and Secure Computing</i>, vol. 5, no1, pp. 37-48, 2008.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000122&pid=S1794-9165201200020000800006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref -->   Referenced in 223, 228</p>     <!-- ref --><p>   7. J. Quah, M. Sriganesh, ''Real-time credit card fraud detection using computational   intelligence'', <i>Expert Systems with Applications</i>, vol. 35, n.o 4, pp. 1721-   1732, nov. 2008.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000124&pid=S1794-9165201200020000800007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p> </font>     <!-- ref --><p><font size="2" face="Verdana, Arial, Helvetica, sans-serif">   8. Vladimir Zaslavsky, Anna Strizhak, ''Credit Card Fraud Detection Using Self-   Organizing Maps'', <i>Information &amp; Securit</i>, vol. 18, n.o 48, pp. 48-63, 2006.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000126&pid=S1794-9165201200020000800008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref -->   Referenced in 223, 228</font></p> <font face="Verdana, Arial, Helvetica, sans-serif" size="2">    <!-- ref --><p>   9. R. Bolton, D. Hand, ''Statistical Fraud Detection: A Review'', <i>Statist. Sci</i>., vol.   17, n.o 3, pp. 235-255, ago. 2002.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000128&pid=S1794-9165201200020000800009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p>     <!-- ref --><p>   10. Linda Delamaire, Hussein Abdou, John Pointon, ''Credit card fraud and detection   techniques: a review'', <i>Banks and Bank Systems</i>, vol. 4, n.o 2, pp. 57-68,   2002.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000130&pid=S1794-9165201200020000800010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 228</p>     <!-- ref --><p>   11. E. Aleskerov, B. Freisleben, y B. Rao, ''CARDWATCH: a neural network based   database mining system for credit card fraud detection'',<i> in Computational Intelligence   for Financial Engineering (CIFEr), 1997., Proceedings of the IEEE/IAFE</i>  1997, 1997, pp. 220 -226.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000132&pid=S1794-9165201200020000800011&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223</p>     <!-- ref --><p>   12. E. Kirkos, C. Spathis,  Y. Manolopoulos, ''Data Mining techniques for the   detection of fraudulent financial statements'', <i>Expert Systems with Applications</i>,   vol. 32, n.o 4, pp. 995-1003, may 2007.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000134&pid=S1794-9165201200020000800012&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223, 234</p>     <!-- ref --><p>13. C. Whitrow, DJ. Hand, P. Juszczak, D. Weston, ''Transaction aggregation as a   strategy for credit card fraud detection'', <i>Data Mining Knowledge Disc</i>, vol. 18, n.o 1, pp. 30-55, 2009.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000136&pid=S1794-9165201200020000800013&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223</p>     <!-- ref --><p>   14. D. S&aacute;nchez, M. A. Vila, L. Cerda, y J. M. Serrano, ''Association rules applied   to credit card fraud detection'', <i>Expert Systems with Applications</i>, vol. 36, n.o 2,   Part 2, pp. 3630-3640, mar. 2009.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000138&pid=S1794-9165201200020000800014&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 223</p>     <!-- ref --><p>   15. Helen Brown, Robin Prescott. <i>Applied Mixed Models in Medicine</i>, Statistics in   Practice, 2001.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000140&pid=S1794-9165201200020000800015&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 225</p>     <!-- ref --><p>   16. ''Mixed Models: Theory and Applications''. &#91;Online&#93;. Available:     <a href="http://www.dartmouth.edu/" target="_blank">http://www.dartmouth.edu/</a>   &#91;Accessed: sept-2011&#93;    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000142&pid=S1794-9165201200020000800016&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref -->. Referenced in 226</p>     <!-- ref --><p>   17. M. Quintana, A. Gallego, M. Pascual, ''Aplicaci&oacute;n del an&aacute;lisis discriminante y   regresi&oacute;n log&iacute;stica en el estudio de la morosidad en las entidades financieras. Comparaci&oacute;n   de resultados'', <i>Pecunia: revista de la Facultad de Ciencias Econ&oacute;micas   y Empresariales</i>, vol. 1, pp. 175-199, 2005.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000144&pid=S1794-9165201200020000800017&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 227</p>     <!-- ref --><p>   18. A. Alderete, ''Fundamentos del An&aacute;lisis de Regresi&oacute;n Log&iacute;stica en la Investigaci&oacute;n   Psicol&oacute;gica'', <i>Revista Evaluar</i>, vol. 6, pp. 52-67, 2006.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000146&pid=S1794-9165201200020000800018&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 227</p>     <!-- ref --><p>   19. Brady West, Kathlen Welch, Andrzej Galecki. <i>Linear Mixed Models: A practical   guide to using statistical software</i>, Chapman &amp; Hall,2007.    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=000148&pid=S1794-9165201200020000800019&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> Referenced in 231</p> </font>      ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Ngai]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Hu]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Wong]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Chen]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
<name>
<surname><![CDATA[Sun]]></surname>
<given-names><![CDATA[X]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature]]></article-title>
<source><![CDATA[Decision Support Systems]]></source>
<year>feb </year>
<month>20</month>
<day>11</day>
<volume>50</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>559-569</page-range></nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Chan]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Fan]]></surname>
<given-names><![CDATA[W]]></given-names>
</name>
<name>
<surname><![CDATA[Prodromidis]]></surname>
<given-names><![CDATA[Andreas, A]]></given-names>
</name>
<name>
<surname><![CDATA[Stolfo]]></surname>
<given-names><![CDATA[S J]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Distributed Data Mining in Credit Card Fraud Detection]]></article-title>
<source><![CDATA[IEEE Intelligent Systems]]></source>
<year>1999</year>
<volume>14</volume>
<numero>6</numero>
<issue>6</issue>
<page-range>67-74</page-range></nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Dorronsoro]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Ginel]]></surname>
<given-names><![CDATA[F]]></given-names>
</name>
<name>
<surname><![CDATA[Sanchez]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Cruz]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Neural fraud detection in credit card operations]]></article-title>
<source><![CDATA[IEEE Transactions on Neural Networks]]></source>
<year>jul </year>
<month>19</month>
<day>97</day>
<volume>8</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>827 -834</page-range></nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Yeh]]></surname>
<given-names><![CDATA[I-C]]></given-names>
</name>
<name>
<surname><![CDATA[Lien]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients]]></article-title>
<source><![CDATA[Expert Syst. Appl.]]></source>
<year>mar </year>
<month>20</month>
<day>09</day>
<volume>36</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>2473-2480</page-range></nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[C. Rong-Chang Chen]]></surname>
<given-names><![CDATA[T-S]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[A new binary support vector system for increasing detection rate of credit card fraud.]]></article-title>
<source><![CDATA[International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI)]]></source>
<year>2006</year>
<volume>20</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>227-239</page-range></nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Srivastava]]></surname>
<given-names><![CDATA[Abhinav]]></given-names>
</name>
<name>
<surname><![CDATA[Kundu]]></surname>
<given-names><![CDATA[Amlan]]></given-names>
</name>
<name>
<surname><![CDATA[Sural]]></surname>
<given-names><![CDATA[Shamik]]></given-names>
</name>
<name>
<surname><![CDATA[Majumdar]]></surname>
<given-names><![CDATA[Arun K]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Credit card fraud detection using hidden Markov model]]></article-title>
<source><![CDATA[IEEE Transactions on Dependable and Secure Computing]]></source>
<year>2008</year>
<volume>5, no1, pp</volume>
<page-range>37-48</page-range></nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Quah]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Sriganesh]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Real-time credit card fraud detection using computational intelligence]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>nov </year>
<month>20</month>
<day>08</day>
<volume>35</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>1721- 1732</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Strizhak]]></surname>
<given-names><![CDATA[VladimirZaslavsky, Anna]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Credit Card Fraud Detection Using Self- Organizing Maps]]></article-title>
<source><![CDATA[Information & Securit]]></source>
<year>2006</year>
<volume>18</volume>
<numero>48</numero>
<issue>48</issue>
<page-range>48-63</page-range></nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Bolton]]></surname>
<given-names><![CDATA[R]]></given-names>
</name>
<name>
<surname><![CDATA[Hand]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Statistical Fraud Detection: A Review]]></article-title>
<source><![CDATA[Statist. Sci.]]></source>
<year>ago </year>
<month>20</month>
<day>02</day>
<volume>17</volume>
<numero>3</numero>
<issue>3</issue>
<page-range>235-255</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Delamaire]]></surname>
<given-names><![CDATA[Linda]]></given-names>
</name>
<name>
<surname><![CDATA[Abdou]]></surname>
<given-names><![CDATA[Hussein]]></given-names>
</name>
<name>
<surname><![CDATA[Pointon]]></surname>
<given-names><![CDATA[John]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Credit card fraud and detection techniques: a review]]></article-title>
<source><![CDATA[Banks and Bank Systems]]></source>
<year>2002</year>
<volume>4</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>57-68</page-range></nlm-citation>
</ref>
<ref id="B11">
<label>11</label><nlm-citation citation-type="confpro">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Aleskerov]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Freisleben]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
<name>
<surname><![CDATA[Rao]]></surname>
<given-names><![CDATA[B]]></given-names>
</name>
</person-group>
<source><![CDATA[CARDWATCH: a neural network based database mining system for credit card fraud detection]]></source>
<year>1997</year>
<conf-name><![CDATA[ Computational Intelligence for Financial Engineering (CIFEr)]]></conf-name>
<conf-date>1997</conf-date>
<conf-loc> </conf-loc>
<page-range>220 -226</page-range></nlm-citation>
</ref>
<ref id="B12">
<label>12</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Kirkos]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Spathis]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Manolopoulos]]></surname>
<given-names><![CDATA[Y]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Data Mining techniques for the detection of fraudulent financial statements]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>may </year>
<month>20</month>
<day>07</day>
<volume>32</volume>
<numero>4</numero>
<issue>4</issue>
<page-range>995-1003</page-range></nlm-citation>
</ref>
<ref id="B13">
<label>13</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Whitrow]]></surname>
<given-names><![CDATA[C]]></given-names>
</name>
<name>
<surname><![CDATA[Hand]]></surname>
<given-names><![CDATA[DJ]]></given-names>
</name>
<name>
<surname><![CDATA[Juszczak]]></surname>
<given-names><![CDATA[P]]></given-names>
</name>
<name>
<surname><![CDATA[Weston]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Transaction aggregation as a strategy for credit card fraud detection]]></article-title>
<source><![CDATA[Data Mining Knowledge Disc]]></source>
<year>2009</year>
<volume>18</volume>
<numero>1</numero>
<issue>1</issue>
<page-range>30-55</page-range></nlm-citation>
</ref>
<ref id="B14">
<label>14</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Sánchez]]></surname>
<given-names><![CDATA[D]]></given-names>
</name>
<name>
<surname><![CDATA[Vila]]></surname>
<given-names><![CDATA[M. A]]></given-names>
</name>
<name>
<surname><![CDATA[Cerda]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Serrano]]></surname>
<given-names><![CDATA[J. M]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Association rules applied to credit card fraud detection]]></article-title>
<source><![CDATA[Expert Systems with Applications]]></source>
<year>2009</year>
<volume>36</volume>
<numero>2</numero>
<issue>2</issue>
<page-range>3630-3640</page-range></nlm-citation>
</ref>
<ref id="B15">
<label>15</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Brown]]></surname>
<given-names><![CDATA[Helen]]></given-names>
</name>
<name>
<surname><![CDATA[Prescott]]></surname>
<given-names><![CDATA[Robin]]></given-names>
</name>
</person-group>
<source><![CDATA[Applied Mixed Models in Medicine]]></source>
<year>2001</year>
<publisher-name><![CDATA[Statistics in Practice]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B16">
<label>16</label><nlm-citation citation-type="">
<source><![CDATA[]]></source>
<year></year>
</nlm-citation>
</ref>
<ref id="B17">
<label>17</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Quintana]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
<name>
<surname><![CDATA[Gallego]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Pascual]]></surname>
<given-names><![CDATA[M]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Aplicación del análisis discriminante y regresión logística en el estudio de la morosidad en las entidades financieras. Comparación de resultados]]></article-title>
<source><![CDATA[Pecunia: revista de la Facultad de Ciencias Económicas y Empresariales]]></source>
<year>2005</year>
<volume>1</volume>
<page-range>175-199</page-range></nlm-citation>
</ref>
<ref id="B18">
<label>18</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Alderete]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<article-title xml:lang="es"><![CDATA[Fundamentos del Análisis de Regresión Logística en la Investigación Psicológica]]></article-title>
<source><![CDATA[Revista Evaluar]]></source>
<year>2006</year>
<volume>6</volume>
<page-range>52-67</page-range></nlm-citation>
</ref>
<ref id="B19">
<label>19</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[West]]></surname>
<given-names><![CDATA[Brady]]></given-names>
</name>
<name>
<surname><![CDATA[Welch]]></surname>
<given-names><![CDATA[Kathlen]]></given-names>
</name>
<name>
<surname><![CDATA[Galecki]]></surname>
<given-names><![CDATA[Andrzej]]></given-names>
</name>
</person-group>
<source><![CDATA[Linear Mixed Models: A practical guide to using statistical software]]></source>
<year>2007</year>
<publisher-name><![CDATA[Chapman & Hall]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
