Introduction
Clinical reasoning is a skill that involves logical thinking and decision-making to effectively diagnose and treat patients in healthcare practice. In physical therapy, it is defined as the application of cognitive and psychomotor skills, as well as reflection and knowledge processes.1,2 The goal of this collaborative, adaptive, and iterative process is to frame the intended outcome within a biopsychosocial framework that considers the perspectives of the patient and the therapist. 3
In this sense, the diagnostic process must be based on adequate clinical reasoning, given that physicians must establish therapeutic objectives and the most appropriate interventions using the information reported by the patient, as well as that obtained from physical examination, medical records, and imaging and laboratory tests.2,4-6 Therefore, clinical reasoning is considered a core skill for health professionals and, consequently, a necessary component of their training, including physical therapists. 5,7
The Objective Structured Clinical Evaluation (OSCE) is a test designed to assess standardized clinical skills and is used for training health professionals. 8 The early incorporation of this type of evaluation in the training of undergraduate health science students allows them to experience a real professional practice environment, which, in turn, promotes the development of skills, reinforces theoretical knowledge, facilitates the clinical reasoning process, and strengthens the students' sense of security and reflective practice. 1,9-12 In view of the foregoing, it has been reported that the OSCE favors self-evaluation and feedback, actions that allow the identification of weaknesses and strengths observed during the evaluation process using this instrument. 13,14
Self-evaluation is defined as the act of assessing oneself in order to make decisions about the next steps to take based on the conclusions obtained, which helps to improve knowledge acquisition and performance, and provides security and motivation. 15 Thus, conducting a self-evaluation using the OSCE encourages students to identify the factors that may be helping or hindering their learning process. 10
On the other hand, several studies have inquired about the opinion of students regarding the OSCE, which has not only facilitated the development of new knowledge to improve its quality, but also provided insights into the value that students give to the use of this instrument in their education. 16,17 Concerning this, it has been found that students with a high OSCE score give greater meaning and relevance to the implementation of this instrument in their training process and value the contents, response times, influence and relevance of this instrument in their development as health professionals, strengthening the link between performance and self-evaluation. 18-22
Providing methodologies to assess clinical competence in a reliable, consistent and valid manner is a challenge in health sciences education. On this point, Yusuf, 23 in a mixed convergent parallel study involving 5 groups of final year students from a medical school in Lahore (Pakistan) assigned for rotation in gynecology wards (each group consisting of 18 students), showed that the OSCE responds to these needs.
Although some studies, such as that of Figueroa-Arce et al.,24 have reported experiences regarding the implementation of this evaluation instrument in physical therapy students, evidence is scarce. This highlights the need to generate knowledge about the self-evaluation and satisfaction of students with the performance of an OSCE that focuses on evaluating and promoting clinical reasoning in physical therapy, as stated by de la Barra-Ortiz et al.25
Therefore, the objectives of the present study were to determine the correlation between performance and self-evaluation (expected performance) of physical therapy students in an OSCE designed to assess clinical reasoning, and to evaluate their level of satisfaction with this instrument.
Materials and methods
Study type and sample
Analytical cross-sectional study with an exploratory scope. The study population comprised 163 fourth-semester students (of a total of 10 semesters) enrolled in the physical therapy program offered at a university in Santiago (Chile) and who were taking the course "Reasoning in physical therapy" during 2018-2 (age range between 18 and 20 years). All students completed the OSCE because it was a summative exam of the course; however, 4 students did not sign the informed consent, so the final sample included 159 students.
OSCE
A committee comprising 7 professors of the course Reasoning in Physical Therapy was created and was in charge of designing and implementing the OSCE, including the design of its 11 stations (each one with a complexity that addresses the achievement of the learning objectives of the course), the performance checklists (scenario and standardized patient stations), and the answer sheets (mailbox stations). They also validated its contents using a 4-point Likert scale that evaluated sufficiency, clarity, coherence, and relevance criteria. It should be noted that the members of the OSCE design committee had experience in the application of this type of evaluation instrument.
Subsequently, a pilot test was conducted on 30 volunteer students who had already taken the course and signed an informed consent form authorizing their participation in the test. In this pilot test, the students completed the 11 OSCE stations (plus 2 rest stations). Their performance at each station was evaluated depending on the type of station (checklists for those involving a clinical scenario or a standardized patient and answer sheets for the mailbox stations) by the same 7 professors of the committee mentioned above, who, through a checklist, issued their observations on the adjustments to be made, as follows: improving the wording of the checklist contents, increasing the time allotted to perform the activities at each station, and eliminating a rest station. These adjustments were fully implemented.
Once these modifications were made, an OSCE consisting of 11 evaluation stations was obtained (S1. Hand washing, S2. Safety in health care, S3. Clinical interview, S4. Cardiorespiratory assessment 1, S5. Musculoskeletal assessment, S6. Neurological assessment, S7. Cardiorespiratory assessment 2, S8. Clinical record, S9. Physical therapy diagnosis 1, S10. Physical therapy diagnosis 2, S11. Resolution of clinical case) and a rest period located between the fourth and sixth.
The stations were classified into three types: "scenario" station (S1 and S2), in which procedures associated with an intervention are recreated; "standardized patient" station (S3-S7), where an actor with a standardized script simulates being a patient in a clinical situation; and "mailbox" station (S8-S11), in which students must solve a clinical case and leave their answer in a mailbox. At the standardized patient and scenario stations, students were evaluated using checklists, while at the mailbox stations they were evaluated using answer sheets. The time allotted to complete each station was 5 minutes, for a total time of 60 minutes.
At each station, different minimum and maximum scores were assigned based on the defined objectives and the level of complexity of the station, with a total score of 198 points. A student was considered to pass the OSCE (passing score as per the passing criteria of the university's school of physical therapy) if their score was >70% of the maximum grade, i.e., 134 points.
The main characteristics of the 11 OSCE stations are described in Table 1.
Table 1 Stations of the Objective Standardized Clinical Evaluation for clinical reasoning in physical therapy.

* World Health Organization.26
† International Classification of Functioning, Disability and Health. 6,27
‡ Rehabilitation Problem-Solving form. 6,27
Source: Own elaboration.
The OSCE was administered on the same day to the entire sample. For this purpose, the students entered in groups of 12 people and each group was allocated to a station (including the rest station). The groups rotated every 5 minutes, ensuring that all participants had the opportunity to be evaluated. It is important to point out that the OSCE was administered to 163 students at the end of the course.
Self-evaluation and satisfaction surveys
Upon completion of the OSCE, students were asked to answer two surveys in order to obtain data regarding the perception of their performance on the test (self-evaluation of performance) and their level of satisfaction with the instrument. It is worth mentioning that while all students responded to the first survey, this was not the case for the second survey.
For the self-evaluation performance survey, a 1 to 5 Likert scale (1: very poor, 2: poor, 3: fair, 4: good, and 5: very good) was used to assess the perceived performance of each student after answering the OSCE at each of the 11 stations (maximum score of 55 points).
Furthermore, the satisfaction survey consisted of 5 questions on, among other things, the general structure and usefulness of the OSCE, as well as the relevance of the stations. As in the self-evaluation survey, a 1 to 5 Likert scale (1: strongly disagree, 2: disagree, 3: neither agree nor disagree, 4: agree and 5: strongly agree) was used for each question to establish the level of student satisfaction with the instrument (maximum score of 25 points). Satisfaction was classified into two levels based on the median score obtained in the survey: low: <20 points and high: ≥20 points. 28
Statistical analysis
The scores obtained in the OSCE and the two surveys were systematized in a database created in Microsoft Excel. A descriptive analysis of the data was performed by calculating absolute and relative frequencies for qualitative variables, and medians and interquartile ranges (IQR) with 25th and 75th percentiles (p25-p75) for quantitative variables, since the distribution of the data was not normal (Shapiro-Wilk test).
The correlation between OSCE performance (overall score, by station, and by type of station) and perceived performance (self-evaluation) was determined using Spearman's correlation coefficient (Rho) because self-evaluation is an ordinal qualitative variable. The correlation between performance in the OSCE (total score) and self-evaluation of performance was determined based on the level of satisfaction in accordance with the two established categories (low and high). All analyses were performed in the STATA 13.0 statistical package and a significance level of p<0.05 was considered. The graph of the correlation between the global score and the performance self-evaluation was made using the JASP 0.14.1.0 software.
Ethical considerations
The study was approved by the Bioethics Committee of the Faculty of Rehabilitation Sciences of the Universidad Andrés Bello through Certified Project A.138 of July 2020. It also followed the ethical principles for biomedical research involving human subjects established in the Declaration of Helsinki. 29 All participants signed an informed consent form.
Results
The median total score obtained in the OSCE was 142 points (IQR: 132-150), and 97 (61.01%) students passed the exam (>134 points). The pass rate was ≥50% in 8 stations (S1: 51.57%; S2: 54.09%; S3: 78.62%; S4: 58.49%; S5: 96.86%; S6: 50.94; S7: 85.53; and S11:57.23%), and the pass rate was very low (2.52%) only in S8 (Table 2).
Table 2 Scores obtained by students in each of the 11 stations of the objective structured clinical evaluation (n=159).

IQR: interquartile range.
Source: Own elaboration.
With respect to perceived performance, the median total score of the self-evaluation was 38 points (IQR: 33-42), with S1 being the station with the highest percentage of students (59.75%) who rated their performance as very good. In contrast, S8 was the station in which the highest proportion of students perceived a very poor performance (8.97%) (Table 3).
Table 3 Performance self-evaluation in the objective clinical assessment by station.

Source: Own elaboration.
A positive and weak correlation was observed between the total score obtained in the OSCE and the performance self-evaluation (Rho=0.31; p<0.001) (Figure 1).

Source: Own elaboration.
Figure 1 Correlation between the total score obtained and the performance self-evaluation in the objective structured clinical evaluation.
Regarding the correlations between performance self-evaluation and the scores obtained for each station, the correlation was positive in all stations, being moderate in one station (S2), weak in seven stations (S3, S4, S5, S6, S7, S1, and S11), and very weak in three stations (S1, S8, and S9) (Table 4). Furthermore, these correlations were statistically significant at all stations, except for S1 (Rho=0.02; p=0.818) and S8 (Rho=0.08; p=0.296) (Table 4).
As for the correlations between the self-evaluation and the scores obtained by type of station, the correlation was positive and moderate in the scenario stations (Rho=0.42; p<0.001) and positive and weak in the standardized patient and mailbox stations (both with Rho=0.28; p<0.001).
Table 4 Correlation between the score obtained in the stations of the objective structured clinical assessment and the performance self-evaluation in each station.

Source: Own elaboration.
On the other hand, the median score on the ECOE satisfaction scale was 20. In addition, 91 (57.23%) students expressed having a high level of satisfaction with the OSCE (Table 5). In the group of students with a low level of satisfaction, the correlation between the total score obtained in the OSCE and the self-evaluation score was positive and very weak (Rho=0.15; p=0.234), while in the group with high satisfaction, the correlation was positive and moderate (Rho=0.48; p<0.001).
Discussion
The present study aimed to determine the correlation between performance and self-evaluation (perceived performance) of physical therapy students from a Chilean university in an OSCE designed to assess clinical reasoning, and to evaluate their level of satisfaction with this instrument.
In this regard, it was found that 57.23% of the participants reported a high level of satisfaction with the OSCE and that a high percentage indicated that they agreed or strongly agreed with the questions in the satisfaction survey, with the exception of question 1, where this proportion slightly exceeded 50%. The students stated that this instrument is useful for their training process and that it is important to evaluate their clinical skills with this type of tool.
These findings are similar to those of other studies in which it has been reported that between 63% and 94.5% of the students have a positive attitude and/or a high level of satisfaction with the OSCE. 25,30-31 For example, in a study conducted between February 2012 and February 2013 in 76 undergraduate medical students at a university in Iran, Khosravi-Khorashad et al.30 reported that 94.5% of the participants had a positive attitude toward a 13-station OSCE and mentioned that this assessment format was more appropriate than other examination methods. In turn, in a 2017 study of 54 medical students at the University of the West Indies (Barbados, Jamaica), Majumder et al.31 found that the majority of participants (63-91%) perceived positively the attributes of a 24-station OSCE.
These findings are also confirmed by Doloresco et al., 32 who conducted a study in 2018 on 124 third-year pharmacy students at the University of Buffalo (Buffalo, USA) and found that, based on the mean scores of their responses, students agreed with most of the instrument's quality assessment items, namely: "The OSCE measured application of skills and abilities required in pharmacy practice" (M=5.8, SD=1.1); "The level of difficulty of the OSCE cases today was appropriate" (M=5.3, SD=1.5); "The topics covered in the session today were relevant to me" (M=5.6, SD=1.3); "The OSCE cases today were a fair measure of my communication skills" (M=5.7, SD=1.3); and "I prefer the OSCE over examinations" (M=4.9, SD=2.0).
On the other hand, in the present study, a positive and weak correlation (Rho=0.31) was found between the total score obtained in the OSCE and self-evaluation. Moreover, when performing this analysis for each station, positive correlations were observed in all stations, being moderate only in one station (S2; Rho=0.51), weak in seven (Rho=0.17-0.38), and very weak in three (S1, S8, and S9; Rho=0.02-0.08). These findings are comparable to those reported by de la Barra-Ortiz et al.,25 who, in a study conducted on 111 Chilean physical therapy students enrolled in the Physical Agents course, found positive correlations between objective performance (score) and perceived performance (self-evaluation) in the 7 stations implemented in the OSCE, being this correlation very weak in 3 stations (Rho=0.11-0.19), weak in 2 (Rho=0.36), and moderate in 2 (Rho=0.47-0.56).
However, such a finding differs from the findings reported by Garza et al., 33 who, in a study of 733 pharmacy students at a U.S. university who completed a 5-station OSCE designed to assess new prescription counseling skills, found a significant but moderate correlation between self- and peer-evaluation and scores obtained in the information gathering, management strategies, and monitoring and follow-up domains (r=0.43-0.51), but not in the communication domain (r=0.12).
It should be noted that in the present study the correlations between the scores obtained in the OSCE and the performance self-evaluation in S1 and S8 were very weak and had no statistical significance. This may be due to the fact that the contents and procedures of these stations were covered in early stages of the course and, consequently, students were familiar with them and, therefore, responded in an automated way, relying on repetition and being less reflective, which could influence a perception of good student performance. 34,35
On the contrary, standardized patient stations require ongoing reflection and greater attention from the students, making them more aware when making decisions in clinical situations. 34,36 This is supported by the conclusions obtained in another study in which an OSCE was performed on 32 third-semester physical therapy students at the Federal University of Minas Gerais (Belo Horizonte, Brazil) who had previous experience with this type of evaluation. 37 In that study, the objective was to measure the correlation between performance levels and self-perception. For this purpose, 3 stations were created with an integrative and simulation approach (which could be comparable to the standardized patient stations of the present study), and a moderate correlation was reported (r=0.475; p=0.007) between OSCE and self-perception scores. This result was attributed to the fact that the OSCE is an instrument that allows to address realistic situations and to receive feedback on physiological and psychological status, which contributes to the development of a positive self-perception. 37 Therefore, incorporating standardized patient stations that allow for an integration of the physiological and psychological components would facilitate decision making, thus improving students' confidence and self-perception.
Finally, it was observed that the correlation between performance and self-evaluation was higher in the group of students with a high level of satisfaction with the OSCE, compared to those with a low level of satisfaction. This finding should be confirmed in future studies, which should also explore the possible reasons that explain a greater congruence between the observed and perceived performance of students taking a test such as the OSCE using qualitative methodologies.
One of the limitations of the present study was that variables such as the communication skills and emotional state (e.g., anxious symptomatology and stress) of the students during the OSCE, which could influence performance and self-perception of performance, were not considered. In this regard, several studies have described a high proportion of students reporting negative emotional distress, although the effect on OSCE scores is inconclusive given the presence of protective factors such as self-confidence and self-efficacy. 37,38 Consequently, further studies could explore potential factors associated with the consistency between obtained performance and expected performance in the OSCE by physical therapy students, and their impact on the evaluative processes considering communicational and emotional variables.