Introduction
Assessing the effectiveness of a treatment, and subsequently the patient’s evolution is considered a breakthrough in the field of clinical practice. Traditionally, outcome assessment was based on efficacy research which evaluated discrete treatment interventions for specific groups of patients (McGlynn, 1996). Nevertheless, the focus has moved into the assessment of patients under real conditions, which is known as effectiveness research (Sederer et al., 1996).
Moreover, it has been a shift in the measures used to establish outcome in clinical practice. Researchers have used different tools to assess changes at symptomatologic levels in experimental trials. However, there has been a shift in the field and the patient’s social capacity and adjustment are considered to be equally important to clinical change (Palomo-Vélez et al., 2020; Sederer et al., 1996). Thus, when measuring clinical assessment, the symptomatic and functional dimension should be considered given that the assessment of these different domains helps clinicians to have more positive outcomes in their daily practice, as it offers a useful methodology to know whether their efforts are consistent with the patient’s psychotherapy process (Lambert, 2012; Sederer et al., 1996).
A benefit associated to measuring outcomes assessment is obtained by given the feedback, since it optimizes the treatment (Hawkins et al., 2004; Miller et al., 2013; Shimokawa et al., 2010). Lambert and his group suggested that regularly assessing the patient’s psychotherapeutic process have beneficial effects on the treatment outcomes, since they proved that if the clinicians have feedback regarding their patients’ process twice the clinically significant and reliable change rate and decrease both deterioration and the number of treatment sessions (Lambert et al., 2001). Additionally, it is especially helpful in the cases where a patient’s treatment is not advancing well (Shimokawa et al., 2010). Many different researchers pointed out (Lambert, 2012; Walfish et al., 2012) that clinicians tend to have an optimistic view of their patients’ processes, and the feedback allows them to notice those cases that are having a poor response to treatment, offering the therapist the chance to adjust the actual treatment to a more adequate one (Lambert, 2013). Likewise, the data obtained from the assessment is usually a more reliable option than the clinical judgment in order to recognise patients whose treatment is failing (Hannan et al., 2005).
As a consequence of the increasing demand for assessing outcomes in psychotherapy, during the 1900s the Outcome Questionnaire (OQ-45) was developed by Lambert and his group (Lambert et al., 1996). And nowadays, it has become one of the most used psychotherapy outcome instruments (Hatfield & Ogles, 2004). It is a 45-item self-report measure, designed for repeated administration during the course of a treatment. It is divided into three domains of patient functioning: symptoms and psychological distress (anxiety and depression), interpersonal relationships (problems related to loneliness or conflict with others) and social role performance (daily activities or quality of life) (Lambert et al., 1996). The OQ-45 is a psychometrically appropriate instrument that has demonstrated to be sensitive to change in diverse populations over short intervals, and also, it remains relatively stable in untreated individuals (Vermeersch et al., 2000; Vermeersch et al., 2004). Therefore, the OQ-45 could be used by clinicians in their day-to-day work, to obtain the patients’ perspective regarding the course of their treatment (Lambert, 2013). Due to its growing popularity, the OQ-45 has been translated into and validated in several languages, like: Japanese, Italian, German, Dutch and Norwegian, among others. Von Bergen and De la Parra (2002) have developed the Spanish version for the Chilean culture. Given that the OQ-45 is not adapted for the Spanish population, and also, due to the high value that has been demonstrated in other countries, the objective of the present work is to adapt and validate the instrument for this culture. Moreover, with the aim of having an effective tool for assessing outcomes in clinical practice, a non-clinical and a clinical sample is going to be used in the procedure. The values of internal consistency are expected to be good and similar to the previous studies. In terms of the concurrent validity, it is expected that OQ-45 scores would correlate positively with clinical indicators and negatively with mental and physical health scores. Regarding the factorial structure, different models have been tested in previous researches (Lo Coco et al., 2008; Kim et al., 2010). So, three correlated factor, four-factor hierarchical and bi-factor models are going to be tested, with the expectation of obtaining appropriate adjustment indexes for a structure that would allow the explanation of the theoretical solution. Finally, through this study, criteria and norms to interpret the scores are going to be provided.
Method
Participants
The study included 639 adult participants, divided into two samples. The clinical sample was comprised of 139 patients who attended mental health centres with a demand for therapeutic assistance (most of them with affective symptomology), 88 of which were female and 51 males who were aged were between 19 and 61 years old (Mean - M = 33.45). The non-clinical sample (n = 500) included people from the community, between 17 and 82 years of age (M = 44.45), 53.4% were females and 46.6% males.
Instruments
The Outcome Questionnaire (OQ-45; Lambert et al., 1996) is a 45-item self-report measure, which is scored using a five-point Likert Scale. The questionnaire generates a general dimension, taking into account all the items: OQ-45 total, but also, three subscales are generated. High scores indicate a perceived worse state. The psychometric properties of the scale have been widely investigated, resulting in an adequate internal consistency reliability coefficient (𝛼 = .93) and concurrent validity with SCL-90-R and BDI with coefficients (persons’ r) about .80 (Lambert et al., 2004). Although, there is a lack of support in its factor structure; users of the OQ-45 consider the total score and the scores obtained from the three subscales (Kim et al., 2010). The translation and preliminary analyses were developed by Iraurgi et al. (2009) and Penas et al. (2017), who found a high internal consistency (.88 to .91), an adequate concurrent validity, and an appropriate factorial structure.
Beck Depression Inventory BDI; (Beck et al., 1996) is 21-item self-report rating instruments that assess characteristics, attitudes and symptoms of depression. The Spanish version was developed by Sanz and Vázquez (1998) with an internal consistency of .87. The internal consistency of the current study was .90.
State-Trait Anxiety Inventory (STAI; Spielberg et al., 1970) is a self-report instrument that includes 20-items for assessing trait anxiety and 20 for state anxiety. The instrument was adapted to the Spanish population (Spielberg et al., 1982). In the present study an Alpha coefficient of .95 was obtained for the state dimension and .93 for the trait dimension.
The Perceived Stress Scale (PSS; Cohen et al., 1983) is 14-item self-report questionnaire which assesses the degree to which individuals consider situations in their lives to be stressful. The Spanish version was developed by Remor (2006) obtaining a reliability of .81. The Alpha the Cronbach of the current research has been .93.
Finally, the Short-Form Health Survey (SF-12; Ware et al., 1996) is a shortened version of the 36-items Short Form of a Health Survey (SF-36), which measures health related quality of life. Its Spanish version was developed by Vilagut et al. (2008). In the present study, the Alpha Cronbach obtained for the physical component was .78 and mental component .82.
Procedure
All the participants completed the OQ-45 in its Spanish version (Iraurgi et al., 2009). Apart from the OQ-45 the clinical sample completed the BDI, STAI and PSS, whereas, the non-clinical data filled out SF-12. The clinical sample was recruited from people attending outpatient Mental Health Services, whereas the non-clinical sample was composed of university students and their relatives. The characteristics and the purpose of the scale were explained to all of the participants. Informed consent was obtained, and participants were ensured that their responses would be confidential and anonymous. Additionally, all of the ethical requirements for conducting this type of the study were followed.
Statistical Analyses
The following descriptive statistics of the OQ-45 items of both samples were calculated with the SPSS program: mean (M), standard deviation (SD), asymmetry (As), the value of the Cronbach’s alpha if the item was removed from each dimension of the scale (α). Furthermore, the differences between clinical and non-clinical scores were obtained through the t-student and its effect size by Cohen’s d. Finally, the Felps test was used to compare the alpha of each dimension with the others. To address the concurrent validity, the correlation between the total scores and the three dimensions of OQ-45 with the scores obtained in the following instruments were measured: BDI, STAI-S and STAI-R and PSS for the clinical sample and SF-12 for the non-clinical one.
The factorial validity was analysed conducting a confirmatory factor analysis (CFA), using Mplus 7. In this process, the three-factor correlated model, four-factor hierarchical model and four-factor bi-level model were proved for both samples. The following indicators were used to assess the level of goodness of fit: the χ2 chi-square, the degrees of freedom (df), the Akaike Information Criterion (AIC), the Adjusted Goodness-of-Fit Index (AGFI), the Normed Fit Index (NFI), the Comparative Fit Index (CFI), Standardized Root-Mean-Square Residual (SRMSR), the Root-Mean-Square Error of Approximation (RMSEA) and its confidence interval (90% CI).
Furthermore, for the cut-off points for the total score of the OQ-45 and each of the subscales the ROC curve (Receiver Operating Characteristics; Cerda & Cifuentes, 2012) was used to estimate the four proposed cut-off points between the clinical and non-clinical sample. The procedure is developed by searching on the curve for the higher values of specificity and sensitivity. This point is determined by the Youden Index (Sensitivity + specificity -1) (Schisterman et al., 2005), which indicates the point where the sensitivity and specificity represents the highest value. Likewise, this type of methodology allows selecting cutoff points that are better suited to the objective of our study, knowing that the reduction or increase in sensitivity implies an increase or reduction of specificity, and vice versa.
The Reliable Change Index (RCI, Jacobson & Truax, 1991) and the clinically significant change were calculated following the procedure proposed by the aforementioned authors. The algorism is: RCI = (x clinical x non-clinical) l Sdif , where Sdif is the standard error of the mean differences , DT is the poled deviation of the clinical and non-clinical samples, and Rxx is the reliability of the OQ-45. Moreover, considering this algorism and the confidence interval of 95%, the Minimum Change Score was calculated (MCS = Sdif × 1,96). The cut-off point (CP) has been calculated following this formula: (xclinical-SD clinical)+ (xnon-clinical-SDnon-clinical)/(SDclinical+SDnon-clinical).
Results
Table 1 shows the differences between the clinical and the non-clinical scores in the global dimension and in the three subscales; all of the differences are statistically significant (t(165.09) = 12.65, p < .001), with the clinical scores being higher. Also, there were differences in each item of the scale, with most of them being statistically significant. Moreover, the effect sizes are considerably high. Regarding internal consistency of the OQ-45, Cronbach’s Alpha coefficients are calculated for the global score (.95) and also for the three dimensions. With the aim of comparing the Alpha Cronbach values, the Feldt test was calculated given that it is statistically significant for the global measure (Feldt = .33, p < .001) and for the three dimensions.
Note. r = the item has been recoded; *Levene’s test of variances has not been significative; α = Alpha if element is eliminated for each item and the Alpha the Cronbach for the total of the scale.
The data related to the concurrent validity is presented in the Table 2. For the clinical sample, all instruments correlated significantly, except the ones of the STAI-State with the symptomatology dimension of the OQ-45. In the non-clinical sample, the two components and the eight dimensions of the SF-12 correlated significantly in a negative way with the global score of the OQ-45, and also, with the three dimensions. All of these correlations are consistent with what would be expected, that is, higher scores for depression, anxiety and stress in the clinical sample, and higher OQ-45 scores. While the better the mental and physical health scores in the non-clinical sample, the lower the dysfunctionality.
Note. Pearson correlation coefficient $ not significative, all the rest are statistically significative for p < .001. The SF-12 was administered to the non-clinical sample and the BDI, STAI and PSS to the clinical sample.
In the Table 3, three different models were calculated through a CFA to support construct validity. For the two samples the bi-factorial structure suited more appropriately; only the AGFI index was slightly inferior than the criterion index, 2(900) = 2641.42, p < .001, AGF = .88, CFI = .92, RMSEA = .049(.037 to .062)) and (χ (= 3930.47, p < .001, AGF = .86, CFI = .91, RMSEA = .061 (.049 to .073)), respectively.
Note. x2: chi-square; dfdegrees of freedom; AICAkaike Information Criterion; AGFIAdjusted Goodness-of-Fit Index; NFINormed Fit Index; CFIComparative Fit Index; SRMSRStandardized Root-Mean-Square Residual; RMSEARoot-Mean-Square Error of Approximation and CI90%Confidence Interval 90% RMSEA.
Figure 1 shows graphically the factor structure of the clinical sample. The factor loadings of three items (11, 26 and 32) were lower than .10. They assess problematic drinking or drug use, and as can be observed in the Table 1, they have skewed scores (the majority of the participants obtained 0). Nevertheless, each item is included in one of the three subscales. The factor structure obtained from the non-clinical sample is similar to the clinical one.
Figure 2 symbolizes the condition of zero discrimination, so any curve that stays away from such diagonal and covers an area towards the upper left corner would indicate a better diagnostic utility.
Table 4 presents the values obtained for the area under the curve (ADC), sensitivity, specificity, chi-square, Youden Index and the corresponding cut-off points of the global score of the OQ-45 and for the other three dimensions. For the global score of the OQ-45, the higher value of Youden Index stands at the value 57.5 and offers a sensitivity of .71 and specificity of .79. Likewise, the symptomatology dimension cut-off point is 32.5 the interpersonal relationships are 14.5, and the social role is 12.5. Furthermore, also it also showed the results obtained after taking into consideration the sensitivity prioritization criterion. For the global score, the value of the cut-off point is 54.5, and offers a sensitivity of .74 and specificity of .73. Likewise, the symptomatology dimension cut-off point is 29.5, the interpersonal relationships are 12.5 and the social role is 11.5.
Moreover, the Reliable Change Index has been calculated for the total score of the OQ-45 (RCI = 3.80 >1.96), where the Minimum Change Score is 17.56, indicating an individual need to gain or lose 17.56 points to be considered a statistically significant improvement or deterioration. Finally, the cut-off point calculated considering the algorism proposed by Jacobson and Truax (1991) is established at 57.43.
Discussion
The Outcome Questionnaire should be considered reliable and valid to measure psychotherapeutic outcomes. No important differences between the original version, studies using other versions (Von Bergen & De la Parra, 2002), and the findings obtained in this research using the Spanish adaptation have been found.
The internal consistency was satisfactory in the total OQ45 score both for the clinical and non-clinical populations: .97 and .91, respectively. Those Cronbach Alpha are similar to the research conducted by Lambert and his colleges (1996). The lower Cronbach Alpha for the non-clinical sample compared to the clinical sample could be due to its range. Furthermore, the internal consistency of the three dimensions was high and adequate. Additionally, significant differences were found in the OQ-45 total scores and in its three dimensions, between the clinical and non-clinical group. As was expected, higher scores were revealed in the clinical sample (patients who came to a mental health centres with a need for therapeutic assistance, affective symptomatology in most cases). Those differences confirm the instrument’s ability to discriminate between clinical and non-clinical populations. Effects sizes reassured the size of those differences.
Regarding concurrent validity, the used instruments were different depending on the sample. Firstly, in the clinical population BDI, STAI and PSS correlated significantly with the three dimension and total scores of the scale, especially with those scales that evaluate the stress and the depressive symptomatology expressed by the individual. As was expected, the higher correlations were obtained in the symptomatology and global dimension of the scale, and instead the correlations were lower in interpersonal and social relationships areas, since these dimensions are measured less by those additional instruments (BDI, STAI and PSS). Nonetheless, the state subscale of the STAI questionnaire obtains different results, where it has a non-significative correlation with the symptomatology dimension, and also, their correlations, despite being low, are higher with the interpersonal relationships and social role dimensions, indicating that this subscale of the STAI is more related to these areas of functioning. Secondly, in the non-clinical sample, the concurrent validity was measured by the SF-12 questionnaire, all the subscales correlated negatively which means that an increase in the OQ-45 scores is associated with a decrease in the scores of the SF-12. The mental component of the SF-12 is higher related to the OQ-45, indicating that the OQ-45 is related to the mental dimension in all of its subscales.
As most of the non-clinical sample was collected in a community setting, it was considered more appropriate to use questionnaires related to quality of life. Whereas, for the clinical sample scales related to stress, anxiety and depression symptomology were used since they are routinely used in the clinical setting, and also because they provided additional information about the patient’s need and problem. Thus, as a consequence of the different instruments used in order of being adequate with the sample, the appropriate concurrent validity between the OQ-45 and the other instruments that measure diverse areas of functioning has been confirmed.
The results of the confirmatory factor analyses for both samples indicated that the OQ-45 is a bi-factorial scale composed of one general factor and three subscales. In this structure each item loaded on one of the three subscales, but also, in a general factor of distress. This model offers an appropriate description of the OQ-45 structure, since it fits with the interpretation process given by the authors (Lambert et al., 1996). This structure was also provided by Lo Coco et al. (2008) with an Italian sample.
In spite of considering the cut-off points that maximize the sensitivity and specificity (Youden’s higher value), the ones that prioritize the sensitivity in a slight way have been established. The objective of the current study is to adapt and validate a scale that is going to be used with clinical patients, and consequently, it is important to capture all the possible cases in order not to leave them unattended. In other words, the clinical criterion has been chosen over the statistical criterion, as the cut-off point suggested through the Youden Index prioritizes the specificity over the sensibility. The cut-off point (CP) obtained for the OQ-45 in the present study, 54.5, is lower than the result of the original research (CP = 63) (Lambert et al., 1996). The resulting values of sensitivity and specificity after the calibration are also smaller in the Spanish adaptation, .74 for sensitivity and .73 for specificity. Timman et al. (2017) also reported a smaller cut-off point for the Dutch version (CP = 55) in comparison to the original one.
Considering the procedure for calculating the cut-off points proposed by Jacobson and Truax (1991), 57.43, which is 3 points superior to the one that resulted from maximizing the sensitivity. Moreover, the amount of points in OQ-45 that an individual needs to improve are 17.56, whereas in the American sample 14 points are necessary (Beckstead et al., 2003). Both indexes obtained from this research would allow the clinician to classify the patients as followed: recovered, improved, no change, or deteriorated. Moreover, they will also be able to choose the most convenient cut-off point based on their judgment, while also taking into account the person that they are working with.
Due to the differences found in terms of cut-off points, the current study has some possible limitations in the collected sample, especially in the size of the clinical sample. As a consequence, it would be adequate to increase the number of participants in the clinical sample. For future research it would be interesting to track the patients’ progress and prove the instrument’s sensitivity to change.
Based on the data presented, the Spanish adaptation of the OQ-45 has appropriate psychometric properties in order to be considered a useful instrument. Moreover, it could be an adequate scale for assessing the functionality of Spanish patients, and consequently, it could help clinicians to evaluate treatment efficacy and establish psychotherapy goals.1