Take-home message

• The Dutch Residency Educational Climate Test (D-RECT), psychometric validated in a Danish context, resulted in a valid and reliable Danish Residency Educational Climate Test (DK-RECT).

• Analysis showed that 2–7 trainees are required to provide reliable educational climate data.

• Analysis of the 11 subscales showed that Feedback had the lowest score, while Supervision and Peer collaboration had the highest.

• DK-RECT, by measuring evaluation of the educational climate, can support quality improvement initiatives by evaluating the effectiveness of interventions in the future and can facilitate the conversation on the educational climate.

Background

Educational climate in hospitals and postgraduate programmes is essential for delivering high-quality training, professional development, and patient care [1].

In this paper, the term educational climate refer to “trainee perceptions of the formal and informal aspects of education” [2], including the prevailing tone in the clinical educational environment [3]. The educational climate can broadly indicate how well clinical postgraduate education function [4]. The educational context, e.g., organization, setting, coaching, assessment, peer collaboration, practices, and procedures, is also of importance. There is ample evidence that a supportive educational climate in medical postgraduate education is beneficial to professional development. The educational climate is intertwined with the curriculum and if the climate is not supportive, it will be difficult for trainees to successfully go through the training [3, 5,6,7]. In Denmark, postgraduate programmes in all specialties are based on a well-structured, competence-based curriculum.

Workplace-based training is fundamental in postgraduate specialist training [4], since learning to become a medical specialist involves working and acting as a specialist [8]. In addition to a string focus on patient care, there should also be attention to education, and it is the key to make sure, that education is not overshadowed by patient case duties [9].

Research shows that the educational climate affects learner motivation and self-confidence, influencing outcomes such as academic achievement [10]. A positive educational climate supports the optimal application of knowledge, effective learning, and prevention of stress and burnout [11, 12]. Moreover, improving the quality of the educational climate may lead to improved quality of life and as well as professional performance in trainees [11, 13].

It is challenging to describe and evaluate educational climate [4], and it is difficult to distinguish from culture, with the two terms often used interchangeably. Glisson, a leading researcher in the field, differentiates between organizational culture and organizational climate, referring to the former as the shared behavioural expectations and norms in a work environment and the collective view of the way work is done, while the latter represents staff perceptions of the impact of the work environment on the individual, how it feels to work at the department, e.g. whether it is supportive or stressful [14]. This paper deliberately focus on educational climate, which we define in alignment with Glisson´s organizational climate, as trainees´ perceptions of their educational climate, The instrument we chose has to measure their trainees´ daily experiences and reflections on a continuum from training to daily work, including choices made regarding personal educational needs. Thus, based on Glisson´s definition, examining organizational culture is beyond the scope of this paper.

In order to apply a psychometrically solid and valid evaluation method, we chose the Dutch Residency Educational Climate Test (D-RECT), which was developed de novo primarily to evaluate the postgraduate educational climate, using various methodological approaches, e.g. qualitative research, a Delphi panel, and questionnaires (Figure 1). D-RECT is also based on research from a variety of specialties with different levels of specialized patient care [9] as is the case at Copenhagen University Hospital, Rigshospitalet, where the present study was conducted. Several other instruments [9] also evaluate educational climate for example the Postgraduate Hospital Education Environment Measure PHEEM, but its theoretical foundation is not clearly described and its underlying factor structure is disputed [15].

Fig. 1
figure 1

Development of the Dutch Residency Educational Climate Test in 2009. Modified from Boor et al. 2011. Labels of each items and subscales are visualised in Table 3

To date, D-RECT has been used to evaluate the educational climate in studies in the Netherlands [11, 16], Ireland [17], Germany [18], Colombia [19], the Philippines [20], Saudi Arabia [21], Morocco [22], and Iran [23], as well as by gynaecologic oncologists in Europe [24].

Validated in the Dutch setting [25], D-RECT has been used extensively for evaluation and research purposes [26,27,28]. Several adjustments to the original structure have been published [26], but both versions are capable of measuring the educational climate, which is why we chose to use the original 50-item D-RECT questionnaire.

For our project, our main objective was to examine the educational climate at Copenhagen University Hospital, Rigshospitalet, Denmark which has never been done systematically before, since no suitable instrument was previously available. D-RECT was developed and validated in the Netherlands [4]. To use the 50-item D-RECT instrument in Danish setting, it was necessary to translate and validate it in the Danish context.

Methods

Aim

The aim of this study was to validate the 50-item D-RECT in a Danish setting and describe and evaluate the educational climate among postgraduate medical trainees at Rigshospitalet, a tertiary hospital.

Adopting D-RECT involved a three-step process: 1) translation of D-RECT into Danish; 2) psychometric validation; and 3) evaluation of educational climate using the Danish Residency Educational Climate Test (DK-RECT).

. We added questions on demographics as these information was necessary for the further analysis; the sex and age of the trainee, when the trainee graduated from medical school (from Danish or a foreign country), the specialty, the length of employment in the department and the educational level in postgraduate training.

Setting and inclusion criteria

Trainees from 31 of Rigshospitalet’s 33 specialties were included and completed the questionnaire, while trainees from psychiatric and forensic medicine were excluded due to differences in educational structure. All participating trainees were in clinical rotations at Rigshospitalet during their postgraduate training programme (Table 1).

Table 1 Medical education and training in Denmark

To avoid small sample sizes that might compromise anonymity and make the statistical analyses inconclusive, trainees were divided into four groups: surgery, medicine, anaesthesiology, and auxiliary. This grou** was done in accordance with the standard curriculum in the specialties and the hospital´s educational organization. According to the East Denmark Regional Board of Postgraduate Medical Education, Rigshospitalet continually has about 400 trainees. In September 2019 the Board invited trainees, identified via an ongoing formal evaluation and quality assurance programme in postgraduate medical training, to complete DK-RECT online. After receiving two e-mail reminders, non-responders were excluded.

Statistics – testing and validation

Descriptive statistics

Descriptive statistics were used for each DK-RECT item, including range of inter-item correlations and correlation with total subscale score and score of other items in the subscale. Inter-item correlation should be neither too low nor too high to indicate whether an item is representative of not only the subscale but also captures something unique, i.e., that items do not duplicate content in other subscales. Item-total correlation examines whether items correlate well with the total score. High item-total correlation is acceptable.

Trainees, divided into the four aforementioned groups (surgery, medicine, anaesthesiology, and auxiliary) were compared with Kruskal-Wallis test, where P<0.05 indicates a statistical difference between the groups overall.

Confirmatory factor analysis (CFA)

CFA, which tests the fit between observed responses and the proposed structure, was used to test whether the items fit the subscale structure. To analyze the five-point Likert scale items we used CFA for ordinal items based on polychoric correlations. This methodology required rescoring of items (collapsing categories 1-3 and 4-5 on the Likert scale), whenever a response category was not observed in a subgroup. We further tested the multidimensional model CFA by studying invariance across sex and the values of a categorised version of the variable length of employment.

We were unable to test the fit across different specialties or the four main groups because some groups were too small and some categories were missing, which made it impossible to collapse them. We then merged the four main groups into two subgroups: medicine - auxiliary and surgery - anaesthesiology. The Supplementary tables and materials provide additional information about the CFA model and analysis.

Reliability analysis (internal consistency)

Generalisability theory was used to address validation and reliability, allowing estimation of the size of relevant influences affecting the measurement [29]. We performed a reliability analysis for the mean total score and for each separate subscale to estimate the number of trainees or specialties needed for reliable scores on the department levels. We treated the total number of items as fixed similar to Boor et al., the number of trainees within a single department and the number of departments were allowed to vary [4]. We applied a standard error of measurement (SEM) for a single specialty of <0.26 (1.96 x 0.26 x 2=1.0). We assumed a maximum noise level of 1.0 on a five-point Likert scale and as the smallest admissible value for a 95% confidence interval.

The CFA was conducted using the R package lavaan [30], while SAS 9.4 was used for other statistical analyses.

Results

Translation into Danish

Two medical consultants bilingual in Danish and Dutch, who also had in-depth knowledge of trainee programmes and educational traditions in both countries, translated the 50-item D-RECT [31]. The model for forward-backward translation was implemented in a modified and simplified approach, as described by Eremenco et al., because only four persons were responsible for the translation process. The two Danish–Dutch consultants were well educated, trained, and possessed the relevant knowledge about the postgraduate curriculum in both countries and the educational organization in Denmark. One forward translated from Dutch to Danish, while the other back translated the result from Danish to Dutch. After comparing the two versions, the Danish educational terminology was adjusted to identify misleading phrasing. Five trainees subsequently took a pilot test.

Description of participants

Population (Table 2): Questionnaires were manually sorted. Of the 445 trainees contacted, 378 completed DK-RECT, and 74 tests were excluded because two thirds or more of the questions (items) were unanswered. The second column (N) in Table 3 lists the number of items answered. Overall, 304 tests were suitable for analysis (68% response rate), with all 31 specialties represented.

Table 2 Baseline characteristics of participants
Table 3 Validation of 50-item DK-RECTa with 11 subscales and evaluation of postgraduate educational climate

The response rate per item was at least 82%. During data collection most trainees (79%) had completed half of their postgraduate training and had worked at Rigshospitalet for an average of 13.1 months (SD: 10.3 months).

Development and validation

Generally, the item-total correlation was high (Table 3), with low correlation (=0.6) for only two items (10 and 11) in the Coaching and assessment subscale. Overall, our multidimensional CFA model showed that participant answers were reliable and independent of sex and length of employment (Supplementary tables and materials). CFA of the 50-item DK-RECT fit (outcome of the Close Fit Index and Trucker-Lewis Index was ≥0.95) and confirmed validity of D-RECT’s factor structure [4].

Our generalisability test showed that for one department, SEM based on the overall score for the 11 subscales, required that 2-7 trainees responded to the questionnaire to achieve a reliable inference of one point (Supplementary tables and materials). In one department, seven trainees were needed for reliable outcomes for every subscale. For groups of specialties, eight different specialties, each with two trainees, were required for a reliable total score (Table 4).

Table 4 Reliability analysis of Danish Residency Educational Climate Test of number of trainees and specialties required for reliable resultsa

DK-RECT results

The mean DK-RECT score was 3.8 (SD 1.1) (median 4(IQR): 3.0-5.0), while the median individual item score was 3.0 (Table 3), except for item 10 (supervisors occasionally observe when patient medical histories are taken), which was 2.0 (IQR: 1.0-3.0). Table 5 shows the overall rating for each subscale for the specialty groups. The educational climate for the 11 subscales was acceptable (median score 3.0), with only subscale 3 (Feedback) rated lower (median 3.0 (IQR: 2.3-3.8). The non-parametric tests showed significant differences across the four specialty groups for: Feedback, Coaching and assessment, Teamwork, Professional relationship between supervisors, and Work adapted to trainee skill level.

Table 5 Evaluation of educational climate among the four specialty groups

Discussion

Main findings

This study indicates that the 50-item DK-RECT is a reliable instrument in a Danish tertiary hospital to examine the educational climate of medical trainees. DK-RECT’s internal consistency is high, the psychometric analysis showing robustness and validity. Moreover, only 2-7 trainees were required from each specialty for reliable results. With a high overall mean rating score DK-RECT showed that the educational climate was good but that some specialties had potential for improvement, particularly in Feedback and Coaching and assessment.

DK-RECT development and validation

CFA of DK-RECT showed that the content of each item was representative and captured unique features. Validation of the DK-RECT questionnaire, as observed by Boor et. al. (2011) for the 50-item D-RECT, and also concluded by Silkens et. al. (2016) for the 35-item D-RECT, was acceptable, despite the fact that the 35-item D-RECT instrument was slightly different and validated in a different context.

There was acceptable homogeneity among individual items and DK-RECT as a whole; no items were unnecessary. The high response rate per item indicates that the trainees used all items, and that the full five-point Likert scale was used. The wording of the questionnaire was important, and although the dual, in-depth translation process was time-consuming, the adaptation turned out to be an essential prerequisite for the analysis and psychometric validation.

Like the Dutch study [4], CFA showed that items did not display differential item functioning (Supplementary tables and materials), indicating that neither the sex of the trainees, the number of years of work, nor whether they belonged to the medicine or surgery group, were predictive of a pattern in how they answered the questions.

Notably, the generalisability analysis (Table 4) showed that 2-7 replies per item in one specialty were sufficient per subscale to achieve a reliable inference of one point and reliable results. This is highly comparable to Boor et al.´s results (2011). Hence, including additional specialties or several trainees from each specialty offers no benefit in achieving more reliable results. This means that specialties with few trainees can assess their educational climate without identification of the responders, confirming the feasibility of DK-RECT. However, ensuring the anonymity of trainees in small departments can be challenging, which is why adding trainees from two or more departments may be necessary. The present study reported specialties in four groups to maintain the full anonymity of respondents.

The 50-item D-RECT instrument, which is now 15 years old, was validated after its introduction in several contexts and performed well, which strengthens the argument for its use. However, it also indicates how important it is to perform validation studies before implementing a tool. One well-known revision was Silkens et al.´s 35-item shorter version [25], which was developed because some items in the original 50-item version performed poorly and seemed outdated. They concluded that the nine-factor model with 35 items fitted better, with an improvement in the Close Fit Index and Trucker-Lewis Index [25]. But even though the exact number of items and the clustering might vary, the 50-items D-RECT is still valid and the results reliable.

Researchers must be aware that the educational climate can change over time and that trainee perceptions and expectations of the ideal educational climate may vary with new and younger trainees. These factors indicate instruments must be re-validated over time and items revised since they may lose relevance, also due to organizational changes or the impact of new developments or initiatives designed to strengthen the educational climate.

Evaluation of postgraduate educational climate

A comparison of DK-RECT with the Dutch mean scores showed that most trainees evaluated the educational climate positively, which is comparable with the Dutch results: 3.8 (SD 0.3) [4]; 3.5 (SD 0.4) [11]; and 3.9 (SD 0.4) [25], even allowing for the fact that the instrument differed lightly in Silkens et.al 2016 study.

Most DK-RECT subscales were positively evaluated, while those with mean scores <3.9 mainly concerned the organization of the education (Teamwork, Professional relationship between supervisors, and Work adapted to trainee skill level), and with supervisor behaviour (Coaching and assessment, and Feedback). The lowest scores were for Feedback (item 13: Structured forms are used to provide feedback, and item 14: Structured observation forms are used to clarify my progress), with a median score of 3.0 (SD 2.0-4.0) and Coaching and assessment (item 10: Supervisors occasionally observe when patient medical histories are taken), with a median score of 2.0 (SD 1.0-3.0). Especially these two subscales showed significant differences across specialty groups.

The Feedback subscale scored lowest overall, with specialty groups differing significantly (Table 5). This should be a matter of concern, because even though providing feedback is a complex, subtle interaction influenced by multiple factors such as the supervisor, message, delivery method, and supervisor-learner relationship [32], immediate, specific, and frequent feedback is clearly vital to successful trainee progression and professional development. This is why it should optimally take place at department level and be benchmarked against others at the hospital. Thus, detecting outliers in DK-RECT measurements offers opportunities to provide remediation in low performing departments and learn from high performers.

Work adapted to trainee skill level was also rated low, with specialty groups differing significantly (Table 5). Raising awareness about trainee educational programmes is beneficial because a good educational climate positively affects teaching faculty [1]. This awareness can involve aligning trainee and supervisor expectations concerning complex patient cases, increasing their familiarity with learning objectives, and acknowledging how time-consuming learning new skills is.

Strengths and limitations

It is a strength that the response rate was high (68%), likely because it was web-based, written in respondents’ native language, and with anonymous participation. Also, data collection was user friendly, and the heads of education in each specialty frequently communicated about the response rate.

Non-responder bias nonetheless remains an issue but was not examined further.

Finally, there are some caveats to consider: first, identifying participants who met our criteria was difficult for all specialties, regardless of the time of data collection since rotations affected whether a trainee worked in a specialty for only a few weeks or months. Although the average length of employment was 13.1 months, the SD was broad, indicating that some clinical rotations lasted only a few months, possibly influencing trainee perception of the educational climate compared to those employed for over a year. This issue can be addressed, either by conducting the survey at a time fixed in relation to clinical rotations, or by analysing the results according to the duration of the clinical rotations. Second, the term supervisor can potentially cause confusion in the translated questionnaire. Trainee comments indicated difficulties in answering questions because the term was used to describe both clinical supervisor and the head of education. The next version of DK-RECT must address this issue. Third, specialties with little direct patient contact questioned the relevance of certain items on, e.g patient care, and especially item 10 (Supervisors occasionally observe when patient medical histories are taken), as subsequently reflected in the low item-total correlation and the floor effect (Table 3). This is in contrast to a previous study [25], which argues that psychometric validation means the test can be used in various postgraduate settings in teaching and non-teaching hospitals and with and without patient care-related aspects. Thus, auxiliary specialties need greater attention in future discussions on the revision of residency educational climate tests.

Conclusions, perspectives, implications, and future research

This study addresses the lack of validated instruments available in Danish with a documented high internal consistency for supporting the development of the educational climate and including formative evaluation. The validated DK-RECT is useful for measuring the educational climate in postgraduate training by offering a standardised, objective method for comparing various educational contexts, e.g., whether differences in subscales and items between specialties are associated with better training and better patient outcomes. DK-RECT allows accurate evaluation of whether curricular and educational changes lead to improvement. We chose to use Boor et al.´s original 50-item D-RECT instrument [4], but Silkens et al.´s 35-item instrument  [25], could also have been applied, though should be validated again if used in future studies in Denmark.

One issue for further exploration is how frequently evaluation of the educational climate should be done. Too frequent evaluation can cause a substantial decline in response rate due to participant response fatigue. Positive results and progress may simply reflect the increased attention paid to the quality of the educational climate [25], i.e., the Hawthorne effect [33]. Even if departments generally continually work to improve the educational climate, curriculum changes may not result in convincing improvements for years. Of note, departments with lower scores may feel pressured to improve the educational climate, whereas higher scores may be a disincentive [25]. The first version of DK-RECT provides the opportunity to establish educational climate benchmarks, allowing comparisons between hospitals in Denmark and abroad.

Developed in 2009, D-RECT did not have access to today’s information and communication technology, social media, digital learning, and simulations. Patient involvement in medical education also represents a valuable way to improve learning. Consequently, updating and further develo** D-RECT and DK-RECT is warranted.