SciELO - Scientific Electronic Library Online

 
vol.15 issue2The Influence of Different Types of Social Support in Caregivers of People with DementiaValidation of a Bullying Scale in Peruvian Adolescents and Gender-Specific Differences author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • On index processCited by Google
  • Have no similar articlesSimilars in SciELO
  • On index processSimilars in Google

Share


International Journal of Psychological Research

Print version ISSN 2011-2084

int.j.psychol.res. vol.15 no.2 Medellín July/Dec. 2022  Epub Mar 05, 2023

https://doi.org/10.21500/20112084.5657 

Research Article

Assessing Language Skills Using Diagnostic Classification Models: An Example Using a Language Instrument* **

Evaluación de las habilidades lingüísticas mediante modelos de clasificación diagnóstica: un ejemplo de uso de un instrumento lingüístico

Georgios D. Sideridi1  2  * 

Ioannis Tsaousis3 

Khaleel Al-Harbi4 

1Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA.

2National and Kapodistrian University of Athens, Navarinou 13A, Athens, Greece.

3National and Kapodistrian University of Athens, Department of Psychology, Athens, Greece.

4Education and Training Evaluation Commission (ETEC), Riyadh, Saudi Arabia.


Abstract

The primary purpose of the present study is to inform and illustrate, using examples, the use of Diagnostic Classification Models (DCMs) for the assessment of skills and competencies in cognition and academic achievement. A secondary purpose is to compare and contrast traditional and contemporary psychometrics for the measurement of skills and competencies. DCMs are described along the lines of other psychometric models within the Confirmatory Factor Analysis tradition such as the bifactor model and the known mixture models that are utilized to classify individuals into subgroups. The inclusion of interaction terms and constraints along with its confirmatory nature enables DCMs to accurately assess the possession of skills and competencies. The above is illustrated using an empirical dataset from Saudi Arabia (n = 2642), in which language skills are evaluated on how they conform to known levels of competency based on the CEFR (Council of Europe, 2001) using the English Proficiency Test (EPT).

Keywords: cognitive Diagnostic Models; Diagnostic Classification Models; CEFR; bifactor models; language

Resumen

El propósito principal del presente estudio fue informar e ilustrar, mediante ejemplos, el uso de Modelos de Clasificación Diagnóstica (DCM) para la evaluación de habilidades y competencias en cognición y rendimiento académico. Un propósito secundario fue comparar y contrastar la psicometría tradicional y contemporánea para la medición de habilidades y competencias. Los DCM se describen siguiendo las líneas de otros modelos psicométricos dentro de la tradición del Análisis Factorial Confirmatorio, como el modelo bifactor y los conocidos modelos mixtos que se utilizan para clasificar a los individuos en subgrupos. La inclusión de términos y restricciones de interacción junto con su naturaleza confirmatoria permite a los DCM evaluar con precisión la posesión de habilidades y competencias. Lo anterior se ilustra utilizando un conjunto de datos empíricos de Arabia Saudita (n = 2642), que evalúan cómo las habilidades lingüísticas se ajustan a los niveles de competencia conocidos, basados en el MCER (Council of Europe, 2001).

Palabras Clave: Modelos de diagnóstico cognitivo; Modelos de clasificación diagnóstica; MCER; Modelos bifactor; Lenguaje

1. Introduction

Traditional and contemporary psychometrics deal with the ordering of individuals across a single latent or more continuum of skills and competencies. However, these models fail to describe a person’s strengths and weak- nesses or fine-grained competencies. Thus, a series of models have been developed termed Cognitive Diagnostic models (CDMs) or Diagnostic Classification Models (DCMs). The additional goal of these models, beyond rank-ordering individuals, is the classification of mastery and non-mastery individuals on specific attributes tapping single or multiple traits (Liu et al., 2018). The methodology has been utilized across a range of skills and competencies documenting the potential benefits of the procedure (Alexander et al., 2016; Gorin & Embretson, 2006; Jang, 2009; Kaya & Leite, 2017; McGill et al., 2016). The present paper is organized along the following axes: (a) it describes the logic and reasoning of Diagnostic Classification Models (DCMs) in relation to other known models and (b) it presents an applied example of the use of DCM for the assessment of language skills and competencies (Rupp & Templin, 2008; Sessoms & Henson, 2018) concerning the CEFR framework (e.g., Alderson, 2007).

2. Diagnostic Classification Models (DCM): Description

Based on traditional modeling approaches, a person’s score is comprised of a raw estimate and a standard estimate that describes the person’s score in relation to the rest of the population (in normative instruments) using continuous or categorical approaches. For example, as shown in Figure 1, upper panel, a person’s score could be the summed estimate of exercises designed to assess basic math skills. Using the proposed Diagnostic Classification Modeling (DCM) approach (Jurich & Bradshaw, 2014; Templin & Bradshaw, 2013; Templin & Hoffman, 2013), as shown in the lower panel of Figure 1, the competencies required to achieve the basic math exercise 5 + 1 1 are both addition and subtraction. Consequently, estimation of the competencies of addition, subtraction, multiplication, and division requires the estimation of both main effects and interactions in exercises that involve multiple competencies. As a result, the conclusion derived from DCMs is one that a person is either proficient or not, in addition, subtraction, etc., but does require work on e.g., division (for an excellent discussion on DCMs see Kunina-Habenicht et al., 2009). On the other hand, the results from traditional analytical approaches (classical or contemporary) would provide information on placement only such as the person being in the 60th percentile in math or having a pass/fail score in response to a categorical classification system.

Figure 1 The logic of diagnostic classification models 

Traditional modeling approaches involved the factor model, exploratory in the old days, and confirmatory later (see Figure 2), upper panel, depicting a 3-factor correlated model of three intercorrelated skills. The middle panel of Figure 2 displays the same 3-skills structure by use of Item Response Theory (IRT) with the boxed (item estimates) containing information on item difficulty levels (crossed line shows intercepts) in addition to their link with the latent factors (slopes). The bottom panel of Figure 2 shows a complex structure, in which items define more than one skill and competencies and the circled latent variable estimates use a split line to denote threshold estimates of categorical variables denoting skill acquisition or not. This model resembles the exploratory structural equation modeling approach, recently put forth by Asparouhov and Muthen (2009).

3. Statistical Properties of Diagnostic Classification Models (DCMs)

Cognitive diagnostic models have recently received increased attention with applications across various disciplines (see Gierl et al., 2010; Tu et al., 2017; Xie, 2017; Walker et al., 2018). The first step in the development of DCMs is the creation of the Q-matrix which shows which items define which skill(s) (Chen et al., 2015; Köhn & Chiu, 2018; Liu et al., 2017; Bradshaw, 2016; Madison & Bradshaw, 2015). Table 1 shows a Q-matrix of a portion of the English Proficiency Test (EPT) measure, in which 8 items were aligned to each one of the A1 and A2 skills, as based on the Common European Framework (CEFR) (seeAlderson,2007;Hasselgreen,2013; Little,2007; Kusseling & Longsdale,2013). These items are dichotomously scored. More information about the instrument can be found here:(https://etec.gov.sa/EN/PRODUCTSANDSERVICES/QIYAS/EDUCATION/EPT/Pages/default.aspx).

Figure 2 Traditional and contemporary models for the assessment of skills and competencies. Horizontal lines within boxes reflect thresholds of categorical variables 

As shown in Table 1, item i19 defines only A1 skills whereas items i7, i18, i21, and i14 only A2 skills. Last, items i6, i1, and i20 define both A1 and A2 skills. Then ext step in the creation of a DCM model is to define parameter values for each item. As shown in Table 2, item i19 was defined to assess only the A1 skill. Consequently, it has an intercept parameterλi,0and a main effect for skill A1 termedλi,1,(1)(and zero terms for skillA2 and the interaction between the two skills). Item 6,which defined both A1 and A2 skills, contains an intercept termλi,0,a main effect for the A1 skillλi,1,(1), a main effect for the A2 skillλi,1,(2), and an interactionterm for A1 and A2λi,2,(1,2). These terms are estimated within a confirmatory latent class model using the termsin Table 3, for estimating the outcomes in each one of the latent classes(Rupp & Templin,2008). The classes in Table 3 define each one of the four possible outcomes: the C1 class shows individuals who do not possess any of the A1 or A2 skills; the C2 class shows the presence of individuals who possess the A2 skill in the absence ofA1 (a potentially undesirable finding); class C3 shows asubgroup of individuals who achieve A1 levels of proficiency; last, C4 participants are those who possess bothA1 and A2 attributes. As shown in Table 3, each item contains an intercept term and then a slope if it defines any of the two skills, as well as an interaction term when that item defines both skills. For example, item 1 has an intercept term λ3,0, an intercept and slope terms whendefining the A1λ3,0+λ3,1,(1)or A2λ3,0+λ3,1,(2)skills, and an intercept, two linear slopes, and an interactiontermλ3,0+λ3,1,(2)+λ3,1,(1)+λ3,2,(1,2)when defining acquisition of both skills.

Table 1 Q-Matrix of 8 items belonging to 2 attributes in reading competency using the EPT as based on the CEFR framework 

Items A1 A2
i19 1 0
i6 1 1
i1 1 1
i20 1 1
i7 0 1
i18 0 1
i21 0 1
i14 0 1

Note. Item numbers reflect actual item numbers of EPT measure.

Table 2. DCM Parameter Values for reading as a second language 

Intercept Main Effect α1 Main Effect α2 2-Way Interaction
Items λi,0 λi,1,(1) λi,1,(2) λi,2,(1,2)
i19 1 1 0 0
i6 1 1 1 1
i1 1 1 1 1
i20 1 1 1 1
i7 1 0 1 0
i18 1 0 1 0
i21 1 0 1 0
i14 1 0 1 0

Note. 1=selected, 0=not selected.

Figure 3 Two-factor correlated model for the assessment of A1 and A2 skills of the EPT instrument (upper panel) and bifactor model extending the 2-factor correlated model with the inclusion of a general factor and two specific ones. 

4. Diagnostic Classification Models (DCMs): An Applied Example

As described above, a DCM was fit to the data from a language instrument (for the acquisition of English as a second language) for the assessment of A1 and A2 skills based on the CEFR framework of languages. The English Proficiency Test (EPT https://etec.gov.sa/EN/PRODUCTSANDSERVICES/QIYAS/EDUCATION/EPT/Pages/default.aspx) targets at determining English language competency for individuals wishing to join academic programs taught in English. It is part of a battery of tests related to university admission and is comprised of 80 multiple-choice questions assessing three domains, namely, language structure (40 items) reading comprehension (20 Items), and written analysis (20 Items).

Table 3. DCM Kernels for each item and each one of the latent classes in Reading 

C1 C2 C3 C4
αC [0,0] [0,1] [1,0] [1,1]
1. i19 λ1,0 λ1,0 λ1,0+λ1,1,(1) λ1,0+λ1,1,(1)
2. i6 λ2,0 λ2,0+λ2,1,(2) λ2,0+λ2,1,(1) λ2,0+λ2,1,(2)+λ2,1,(1)+λ2,2,(1,2)
3. i1 λ3,0 λ3,0+λ3,1,(2) λ3,0+λ3,1,(1) λ3,0+λ3,1,(2)+λ3,1,(1)+λ3,2,(1,2)
4. i20 λ4,0 λ4,0+λ4,1,(2) λ4,0+λ4,1,(1) λ4,0+λ4,1,(2)+λ4,1,(1)+λ4,2,(1,2)
5. i7 λ5,0 λ5,0+λ5,1,(2) λ5,0 λ5,0+λ5,1,(2)
6. i18 λ6,0 λ6,0+λ6,1,(2) λ6,0 λ6,0+λ6,1,(2)
7. i21 λ7,0 λ7,0+λ7,1,(2) λ7,0 λ7,0+λ7,1,(2)
8. i14 λ8,0 λ8,0+λ8,1,(2) λ8,0 λ8,0+λ8,1,(2)

Note. λ1,0=intercept of item 1. The c-term denotes latent class.

Participants were 2642 examinees who took on the English Proficiency Test (EPT), measure as part of their English competency exam. The specific items were designed to assess two attributes or skills, namely, A1 and A2 as per the CEFR framework. The hypothesis put forth was that there would be a distinct group possessing A1 skills, a group having both A1 and A2 skills, and, more interestingly, would examine the presence of a group that does not possess any of the two skills, called pre-A1 level, which has been observed in certain cultures. Data were analyzed using the Q-matrix in Table 1, the parameters in Table 2, and the LCDM cognitive Kernel functions in Table 3 (see also DiBello et al., 2015). Furthermore, data were also analyzed using a variable based approach and the confirmatory factor models of 2-factor correlated, and bifactor models (see Figure 3). All models were analyzed using Mplus 8.7. An annotated syntax file using Mplus that was used for the DCM model in Figure 4 is shown in Appendix A.

As Figure 3 shows, both a 2-factor correlated model and a bifactor model provided a very good model fit using both absolute and relative criteria [2-factor model: χ 2(19) = 36.566, p = .009, CFI=.999, TLI=.998; RM- SEA=.019; bifactor model: χ 2(12) = 20.160, p = .064, CFI=.999, TLI=.998; RMSEA=.016]. Specifically, the 2-factor correlated model showed significant factor loadings for each indicator on the respective latent construct. Furthermore, the bifactor model provided a superior model fit with the general factor being a dominant factor and the two specific factors losing most of the explanatory power. Subsequently, factor scores reflecting person’s abilities were saved for further scrutiny.

For comparative purposes, data were also analyzed using a 2-class and a 3-class exploratory model with no constraints on the latent class formation, thus, subgroups emerged in an exploratory fashion. Last, when data were analyzed by use of the DCM model, results indicated the presence of three distinct subgroups: a group having neither A1 and A2 skills, termed pre-A1 level group, which comprised 830 participants, representing 31.4% of the samples’ participants. A group having A2 skills in the absence of A1 was not observed, as expected, having zero participants. Two groups with A1 and A2 skills reflected 399 and 1,413 participants, which accounted for 15.1% and 53.5% of the participants. These results are shown in Figure 4 with the preA1 class showing response probabilities less than 50% throughout, A1 participants being successful on only the 4 specific A1 items, and A2 individuals being successful with a probability of success greater than 50% on all eight language items. Table 4 presents model comparisons across several competing models. As shown in the table and based on information criteria, the best model fit was linked to a 3-class exploratory model. However, this model was not interpretable with regard to the measurement of specific skills and competencies, that is, A1 and A2 levels, because there was a class with mixed skills that are against the logic of mastery put forth by DCM models. Consequently, the 3-class exploratory model was not deemed appropriate. From the remaining models, a superior model fit emerged for the DCM model with 3 skills, including the interesting pattern of [0,0], suggesting the absence of minimum levels of A1 skills. A similar model fit was observed by the 2-class exploratory model and the bifactor models that have a close resemblance to the DCM model but were inferior in model fit. Last, the worst fit was observed by the 2-factor correlated model.

In an attempt to compare and contrast person-based estimates from the factor model and the DCM, scatterplots were created, as shown in Figure 5. The upper panel of the figure shows factor scores based on the 2-factor correlated model, which were related to the person-based estimates, based on the latent class representing pattern [1,0]. The relationship between the two estimates was only 0.487, which is at best modest. The lower panel of the figure shows the relationship between the general factor scores, from the bifactor model (reflecting ability estimates at A1 and A2) in relation to the latent class of the DCM representing pattern [1,1]. The relationship was .927, relating the two person-based estimates, which were very high. However, the scores at A1 level skill were very discrepant between the factor model and the DCM, showing disparate estimates of person ability. Furthermore, no comparison was available for the class lacking A1 skills (i.e., pattern [0,0]), as the fac-tor model could not provide scores for such a subgroup.

Table 4. Model Fit Comparison Using Evaluative Criteria 

Model Comparison LL Npar AIC BIC SABIC CAIC WE
2-Factor Correlated −10548.79 17 21131.59 21231.53 1177.52 21248.53 1416.48
2-Class Exploratory −10554.82 17 21143.65 21243.60 21189.58 1260.60 1428.54
Bifactor Model −10540.11 24 21128.21 21269.31 21193.06 1293.31 1530.42
3-Class Exploratory −10433.85 26 20919.69 21072.56 20989.95 21098.56 1355.42
DCM: 3 Skills −10483.34 27 21020.68 21179.42 21093.64 21206.42 1473.16

Note. λ1,0=intercept of item 1. The c-term denotes latent class.

Figure 4 Diagnostic cognitive model for the assessment of pre-A1, A1, and A2 skills and competencies of the EPT measure 

5. Conclusions, Limitations and Recommendations for Future Research

The primary purpose of the present study was to inform and illustrate, using examples, the use of Diagnostic Classification Models (DCMs) for the assessment of skills and competencies in language skills and competencies. A secondary purpose is to compare and contrast traditional and contemporary psychometrics for the measurement of skills and competencies. The most important finding of the present application was that three distinct language skills groups were observed by use of the DCMs, including the recently observed pre-A1 group across various countries (e.g., Bower et al., 2017). The pre-A1 group was comprised of ample participants who did not possess the required level of A1 proficiency as delineated by the CEFR framework, certainly not in the Kingdom of Saudi Arabia.

The present findings, however, are very important, especially when contrasted with those of traditional methodologies such as the factor model (Gorsuch, 1983). Relationships between person-based estimates of ability and those from the DCM were modest, to say the least when looking at the A1 skill. Consequently, estimates of person skill acquisition by use of the factor model are clearly inappropriate in light of the advanced knowledge provided by the CDMs. A significant difference between the two is that the factor model addresses the question of degree of acquisition in total, and in the absence of requisite and other skills that extend beyond the person’s level. All that information is included in the factor model and contributes to the estimation of persons’ skills and competencies. In the DCMs, however, a skill is clearly defined as being dependent upon a specific set of competencies and excludes other competencies that potentially confound the measurement ofa person’s abilities. For that reason, DCMs represent amore accurate estimate of a person’s set of skills.

Figure 5 Comparisons between person estimates of ability as based on the factor model and the DCM model’s estimates. Values on the y-axis are factor scores from the first factor (A1) of the CFA model (upper panel) and in the lower panel, factor scores from the general factor of the bifactor model in CFA 

The present findings have several limitations. One of the potential limitations reflects a large number of available DCM models and the proper choice among them (Alexander et al., 2016; Bonifay & Cai, 2017; Bozard, 2010; Bradshaw & Madison, 2016; Bradshaw et al., 2014; Davier, 2009). A second limitation put forth by Raykov relates to the internal consistency estimates of latent subgroups reflecting specific skill levels (see Huang, 2017). A third potential limitation reflects accounting for complex structures and also the presence of covariates in the model that likely alter person’s estimates of skills (Xia & Zheng, 2018). For example, in a measure of learning disabilities, how would a measure of IQ as a covariate will factor in the model? (See McGill et al., 2016). A fourth potential limitation reflects disparate opinions on what constitutes a proper measure of global fit in DCM models in light of challenges related to the number of response patterns and consequently the degrees of freedom, etc. (Hansen et al., 2016). A fifth limitation relates to the estimation of item discrimination parameters (Henson et al., 2018) and that of personbased estimates of fit (Emons et al., 2003). Last, issues on the measurement of reliability of DCM Kernes have been raised (Templin & Bradshaw, 2013). Conversely, future directions may target at empirically investigating how to deal with these potential limitations, as well as a methodological extension such as the use of DCMs in Computerized Adaptive Testing (CAT) environments (Wang, 2013). For example, accounting for complex structures may involve simply modeling random effects to incorporating stratification weights.

References

Alderson, C. (2007). The CEFR and the need for more research. The Modern Language Journal, 91, 659- 663. https://doi.org/10.1111/j.1540-4781.2007.00627_4.xLinks ]

Alexander, G. E., Satalich, T. A., Shankle, W. R., & Batchelder, W. H. (2016). A cognitive psychometric model for the psychodiagnostic assessment of memory-related deficits. Psychological assessment, 28 (3), 279. https://doi.org/10.1037/pas0000163Links ]

Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16, 397-438. https://doi.org/10.1080/10705510903008204Links ]

Bonifay, W., & Cai, L. (2017). On the complexity of item response theory models. Multivariate behavioral research, 52 (4), 465-484. https://doi.org/10.1080/00273171.2017.1309262Links ]

Bower, J., Runnels, J., Rutson-Griffiths, A., Schmidt, R., Cook, G., Lehde, L., & Kodate, A. (2017). Aligning a Japanese university’s English language curriculum and lesson plans to the CEFR-J. In F. O’Dwyer, M. Hunke, A. Imig, N. Nagai, N. Naganuma, & M. G. Schmidt (Eds.), Critical, Constructive Assessment of CEFR-informed Language Teaching in Japan and Beyond (pp. 176- 225). Cambridge University Press. [ Links ]

Bozard, J. L. (2010). Invariance testing in diagnostic classification models (Doctoral dissertation). The University of Georgia. https://getd.libs.uga.edu/pdfs/bozard_jennifer_l_201005_ma.pdfLinks ]

Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational measurement: Issues and practice, 33 (1), 2-14. https://doi.org/10.1080/15305058.2015.1107076Links ]

Bradshaw, L. P., & Madison, M. J. (2016). Invariance properties for general diagnostic classification models. International Journal of Testing, 16 (2), 99-118. https://doi.org/10.1080/15305058.2015.1107076Links ]

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110 (510), 850-866. https://doi.org/10.1080/01621459.2014.934827Links ]

Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press. [ Links ]

Davier, M. V. (2009). Some notes on the reinvention of latent structure models as diagnostic classification models. Measurement: Interdisciplinary Research and Perspectives, 7 (1), 67-74. https://doi.org/10.1080/15366360902799851Links ]

DiBello, L. V., Henson, R. A., & Stout, W. F. (2015). A family of generalized diagnostic classification models for multiple choice option-based scoring. Applied Psychological Measurement, 39 (1), 62-79. https://doi.org/10.1177%2F0146621614561315Links ]

Emons, W. H., Glas, C. A., Meijer, R. R., & Sijtsma, K. (2003). Person fit in order-restricted latent class models. Applied psychological measurement, 27 (6), 459-478. https://doi.org/10.117%2F0146621603259270Links ]

Gierl, M. J., Alves, C., & Majeau, R. T. (2010). Using the attribute hierarchy method to make diagnostic inferences about examinees’ knowledge and skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing , 10 (4), 318- 341. https://doi.org/10.1080/15305058.2010.509554Links ]

Gorin, J. S., & Embretson, S. E. (2006). Item difficulty modeling of paragraph comprehension items. Applied Psychological Measurement , 30, 394-411. https://doi.org/10.1177/0146621606288554Links ]

Gorsuch, R. (1983). Factor analysis. Lawrence Erlbaum Associates. [ Links ]

Hansen, M., Cai, L., Monroe, S., & Li, Z. (2016). Limitedinformation goodness-of-fit testing of diagnostic classification item response models. British Journal of Mathematical and Statistical Psychology, 69 (3), 225-252. https://doi.org/10.1111/bmsp.12074Links ]

Hasselgreen, A. (2013). Adapting the CEFR for the classroom assessment of young learners’ writing. The Canadian Modern Language Review, 69, 415-435. https://doi.org/10.3138/cmlr.1705.415Links ]

Henson, R., DiBello, L., & Stout, B. (2018). A Generalized Approach to Defining Item Discrimination for DCMs. Measurement: Interdisciplinary Research and Perspectives , 16 (1), 18-29. https://doi.org/10.1080/15366367.2018.1436855Links ]

Huang, H. Y. (2017). Multilevel cognitive diagnosis models for assessing changes in latent attributes. Journal of Educational Measurement, 54 (4), 440-480. https://doi.org/10.1111/jedm.12156Links ]

Jang, E. (2009). Cognitive diagnostic assessment of L2 reading comprehensionability: Validity arguments for Fusion Model application to LanguEdge assessment. Language Testing, 26, 31-73. https://doi.org/10.1177%2F0265532208097336Links ]

Jurich, D. P., & Bradshaw, L. P. (2014). An illustration of diagnostic classification modeling in student learning outcomes assessment. International Journal of Testing , 14 (1), 49-72. https://doi.org/10.1080/15305058.2013.835728Links ]

Kaya, Y., & Leite, W. L. (2017). Assessing change in latent skills across time with longitudinal cognitive diagnosis modeling: An evaluation of model performance. Educational and psychological measurement, 77 (3), 369-388. https://doi.org/10.1177%2F0013164416659314Links ]

Köhn, H. F., & Chiu, C. Y. (2018). How to Build a Complete Q-Matrix for a Cognitively Diagnostic Test. Journal of Classification, 35 (2), 273-299. https://doi.org/10.1007/s00357-018-9255-0Links ]

Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2009). A practical illustration of multidimensional diagnostic skills profiling: Comparing results from confirmatory factor analysis and diagnostic classification models. Studies in Educational Evaluation, 35 (2-3), 64-70. https://doi.org/10.1016/j.stueduc.2009.10.003Links ]

Kusseling, F., & Lonsdale, D. (2013). A corpus-based assessment of French CEFR lexical content. The Canadian Modern Language Review , 69, 436-461. https://doi.org/10.3138/cmlr.1726.436Links ]

Little, D. (2007). The common European framework of reference for languages: Perspectives on the making of supranational language education policy. The Modern Language Journal , 91, 645-655. https://doi.org/10.1111/j.1540 4781.2007.00627_2.xLinks ]

Liu, R., Huggins-Manley, A. C., & Bradshaw, L. (2017). The impact of Q-matrix designs on diagnostic classification accuracy in the presence of attribute hierarchies. Educational and psychological measurement , 77 (2), 220-240. https://doi.org/10.1177%2F0013164416645636Links ]

Liu, R., Huggins-Manley, A. C., & Bulut, O. (2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and psychological measurement , 78 (3), 357-383. https://doi.org/10.1177%2F0013164416685599Links ]

Madison, M. J., & Bradshaw, L. P. (2015). The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educational and psychological measurement , 75 (3), 491-511. https://doi.org/10.1177%2F0013164414539162Links ]

McGill, R. J., Styck, K. M., Palomares, R. S., & Hass, M. R. (2016). Critical issues in specific learning disability identification: What we need to know about the PSW model. Learning Disability Quarterly, 39 (3), 159-170. https://doi.org/10.1177%2F0731948715618504Links ]

Rupp, A. A., & Templin, J. L. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement, 6 (4), 219-262. https://doi.org/10.1080/15366360802490866Links ]

Sessoms, J., & Henson, R. A. (2018). Applications of Diagnostic Classification Models: A Literature Review and Critical Commentary. Measurement : Interdisciplinary Researchand Perspectives, 16 (1), 1-17. https://doi.org/10.1080/15366367.2018.1435104Links ]

Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification , 30 (2), 251-275. https://doi.org/10.1007/s00357013-9129-4Links ]

Templin, J., & Hoffman, L. (2013). Obtaining diagnostic classification model estimates using Mplus. Educational measurement: Issues and practice , 32 (2), 37-50. https://doi.org/10.1111/emip.12010Links ]

Tu, D., Gao, X., Wang, D., & Cai, Y. (2017). A new measurement of internet addiction using diagnostic classification models. Frontiers in psychology, 8, 1768. https://doi.org/10.3389%2Ffpsyg.2017.01768Links ]

Walker, G. M., Hickok, G., & Fridriksson, J. (2018). A cognitive psychometric model for assessment of picture naming abilities in aphasia. Psychological assessment , 30 (6), 809-826. https://doi. org/10.1037%2Fpas0000529Links ]

Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and psychological measurement , 73 (6), 1017-1035. https://doi.org/10.1177%2F0013164413498256Links ]

Xia, Y., & Zheng, Y. (2018). Asymptotically Normally Distributed Person Fit Indices for Detecting Spuriously High Scores on Difficult Items. Applied psychological measurement , 42 (5), 343-358. https://doi.org/10.1177%2F0146621617730391Links ]

Xie, Q. (2017). Diagnosing university students’ academic writing in English: Is cognitive diagnostic modeling the way forward? Educational Psychology, 37 (1), 26-47. https://doi.org/10.1080/01443410.2016.1202900Links ]

*Declaration of data availability: All relevant data are within the article, as well as the information support files.

*How to Cite: Sideridis, G. D., Tsaousis, I., & Al-Harbi, K. (2022). Assessing Language Skills Using Diagnostic Classification Models: An Example Using a Language Instrument. International Journal of Psychologycal Research, 15 (2), 94-104. https://doi.org/10.21500/20112084.5657

Appendix A. Mplus syntax file for DCM model of Figure 4. Model constraint statement includes only the first two items for illustration purposes

Received: October 24, 2021; Accepted: July 20, 2022

Corresponding author: Georgios D. Sideridis. mail: georgios.sideridis@childrens.harvard.edu

Conflict of interests:

The authors have declared that there is no conflict of interest.

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License