1 Introduction

In the last decade, calls to adopt a wider socio-cultural stance have led to a reconceptualization of skills-focused digital literacy in favour of broader digital competency models that recognise the more diverse knowledge, capabilities and dispositions needed to succeed in education (Falloon, 2020; Peters et al., 2021). This is hardly surprising given that the European Commission (2006) has identified digital competence as one of the eight key life competencies in their recommendations for lifelong learning. Concomitantly, the number of studies employing self-assessment tools to gauge the level of digital competence in tertiary education has grown exponentially. This has lead scholars (e.g., Peters et al., 2021; Spante et al., 2018; Starkey, 2020) to caution on the limitations of self-reported data and to concur on the need for future studies to advance the field of digital competence beyond self-assessment.

Although causal relationships have not been conclusively determined, evidence suggests that gaps have been shown to occur between self-perceived competence and performance in practice in the field of digital competence (Maderick et al., 2016). Despite this caveat, studies continue to employ self-assessment as a standalone method to identify the changes needed to improve and determine education policy vis-à-vis digital competence in higher education (e.g., Cabero-Almenara et al., 2020a; Mora-Cantallops et al., 2022). However, few of these studies support the subjective viewpoint of perceived digital competence with objective measures that corroborate actual level of competence demonstrated through task performance. Unless we can demonstrate that self-reported data are a true reflection of students’ and educators’ actual level of digital competence at university, implementing changes to education policy purely based on the perceptions of students and educators is a speculative undertaking.

Accordingly, this study develops and validates two innovative rubric-based frameworks that examine digital competence from a broader model that goes beyond self-assessment. An approach that has not been previously considered to the knowledge of this study. To enable future researchers to obtain a more accurate measure of digital competence within higher education teaching and learning, these rubrics serve to compare self-assessment with external observation. The aim being to provide new evidence upon which to determine empirical education policy. First, we review some of the most prominent research designs and frameworks to date that have made a significant contribution to the fields of digital competence for both university students and educators. After which, the need to go beyond self-assessment and into performance evaluation is highlighted. Some suitable models are presented that can be combined with self-assessment tools to provide a more accurate picture of digital competence at university level. Having outlined all the relevant theoretical constructs, we move on to explain how these models were adapted to develop two rubric-based frameworks that enable future researchers to gain new ground in this field. The validation process is then described, and the results are discussed. Lastly, we conclude with some general implications and recommendations on the future implementation of these validated rubrics.

2 A brief review of digital competence frameworks in higher education

In the European Union, the geographical context of this study, the Digital Competence Framework for Educators (DigcompEdu) and EU citizens (DigComp) self-assessment tools published by the Publications Office of the European Union and authors Redecker (2017) and Ferrari (2013), respectively, are two of the most consolidated and significant digital competence research frameworks to date (Cabero-Almenara et al., 2020b). A search on Scopus in 2023 for peer-reviewed articles that mention either the DigCompEdu or DigComp framework brings up over 2,500 articles, 275 of which specifically mention these frameworks in their title, abstract, and/or keywords. Both the DigCompEdu and DigComp frameworks have been structured upon the following definition of digital competence:

Digital competence involves the confident, critical and responsible use of, and engagement with, digital technologies for learning, at work, and for participation in society. It includes information and data literacy, communication and collaboration, media literacy, digital content creation (including programming), safety (including digital well-being and competences related to cybersecurity), intellectual property related questions, problem solving and critical thinking (European Commission Directorate-General for Education,Youth, Sport and Culture, 2019: 10).

Hinged on this definition, these frameworks establish 12 areas of digital competence for educators and citizens. The DigComp framework for citizens (Ferrari, 2013), revised in 2016 and 2017 (Carretero et al., 2017), examines five dimensions of digital competence: 1. Information and Data Literacy; 2. Communication and Collaboration; 3. Digital Content Creation; 4. Safety; and 5. Problem Solving. The 2.2 update of this framework published by Vuorikari et al. (2022) consolidates previously released publications and user guides within these dimensions. The DigCompEdu framework, specifically published for educators (Redecker, 2017), evaluates seven dimensions of educator digital competence: 1. Professional Engagement; 2. Digital Resources; 3. Teaching and Learning; 4. Assessment; 5. Empowering Learners; 6. Facilitating Learners’ Digital Competence; and 7. Open Education (Mora-Cantallops et al., 2022). Both DigComp and DigCompEdu have had a noteworthy influence on European education policy and the expansion of research used which develops scales and self-assessment instruments for measuring digital competence (Basilotta-Gómez-Pablos et al., 2022; Mattar et al., 2022; Muammar et al., 2023).

Although these frameworks serve as a benchmark to identify changes that may be conducive to greater digital competence in higher education, they do not address the gap that has been shown to occur between self-reporting and performance in practice (Maderick et al., 2016; Starkey, 2020; Willermark, 2018). Specifically, the way in which individuals face reality through their personal and subjective vision, referred to as competence idealisation in the literature (Cabero-Almenara et al., 2020a). Recent systematic reviews of the literature exploring digital competence at tertiary level (e.g., Hew et al., 2019; Peters et al., 2021; Spante et al., 2018) express a real concern about the quality of the conduct and reporting of research. In these reports, scholars underscore a primary need to reorient away from basic forms of research, driven by teacher and student self-perceptions, to more robust forms of data collection that contribute to theory advancement. Their observations stress an urgent need to support self-reporting findings with other methods that will help us learn something new about the theories being applied to the concept of digital competence. The development of a pedagogically-sound instrument that enables us to compare self-reported levels of digital competence with actual levels of digital competence evidenced in task performance would provide new data by challenging and expanding the explanatory ability of self-perception in the field of digital competence. Furthermore, in a recent review of institutional self-assessment instruments, Volungevičienė et al. (2021: 22) concluded that the DigComp and DigCompEdu frameworks should be used in combination with other measures to provide a deeper or more complete assessment of digital competence. Their findings suggesting that a dialogical, “pick and mix” approach may be more productive in terms of future efforts to support and scaffold critical self-assessments that lead to real and transformative change in higher education institutions. In light of this, a review of the literature furnished 13 empirically-tested models (Table 1) that could be used in combination with the DigComp and DigCompEdu self-assessment frameworks to evaluate performance in practice, and in this way advance theoretical knowledge and the field beyond self-assessment.

Table 1 Empirically-tested models and European Commission frameworks

To start with, five models provide suitable criteria for the external evaluation of student and educator digital competence as defined by the European frameworks. The first relates to the JISC (2017) Digital Capabilities framework (DigiCap), an internationally recognised model that has been a major source of reference in many studies (e.g., Handley, 2018; Varga-Atkins, 2020). Predominantly based around the concept of literacy, its most recent framework adopts a wider digital information literacy lens that extends to identifying current strengths and areas for development in relation to how students and educators evaluate, develop, and share digital content (JISC, 2017). In like manner, the Society of College, National and University Libraries (SCONUL) and Sheffield Hallam University have also made significant strides in the development of frameworks that regulate and assess the knowledge and competences involved in digital information literacy (Austen et al., 2016; Handley, 2018). An important recent development was the publication of the SCONUL7 Pillars of Information Literacy through a Digital Literacy ‘lens’ (SCONUL, 2016), which specifically measures digital content creation, responsible use of, and engagement with digital information for teaching and learning. Given the effective contribution of SCONUL7 to the development of student (Siddall, 2022) and librarian (Mawson & Haworth, 2018) digital proficiencies in the UK, it would be valuable to apply this model to other European contexts and educator digital competence. The Guidelines for Develo** Digitally-capable Teaching Excellence (TEL) published by Sheffield Hallam University (Austen et al., 2016) also offer good practice guidelines for the effective integration of digital capability and teaching excellence into a unitary construct for UK universities. What is more, the guidelines are purposely broad to enable wide transferability across the global sector and meaningful application at subject and discipline level in relation to the selection and evaluation of digital information and resources by both students and teachers.

More recently, Anderson & Krathwohl’s (2001) Revised Bloom’s Taxonomy and Maton's (2013) Legitimation Code Theory (LCT) have been applied to examine new processes and actions associated with digital competence. Bloom’s two-dimensional classification system has served to evaluate how students and teachers interact with digital content at both knowledge and cognitive process levels (e.g., Alaoutinen, 2012; Amin & Mirza, 2020; Husain, 2021; Vavilina, 2020). Fundamentally, the model, which goes from simple to more complex and challenging types of thinking (lower-order thinking skills: remembering, understanding, applying; higher-order thinking skills: analysing, evaluating, creating) can help to determine whether digital content and information used in teaching and learning activates critical-thinking skills. From a semantics perspective, the concept of LCT ‘semantic waves’ (Maton, 2013) has proven apt in deciphering knowledge-building practices within digital academic content (González-Mujico & Lasagabaster, in press). A semantic wave structure can be achieved when abstract language and technical concepts that need to be covered are unpacked using concrete contexts and simpler language, and these ideas are then repacked again by linking them back to the abstract concepts and technical language students need to master (Maton, 2013). Recurrent shifts between unpacking and repacking of knowledge have been shown to be conducive to enabling learners to build their mastery of a subject (e.g., Curzon et al., 2020; Maton, 2019). For this reason, the ability to demonstrate effective meaning-making practices needs to be considered when assessing the use and development of digital content.

Secondly, a further eight models render suitable criteria at the level of educator digital competence. In the geographical context of this study, Spain, the Department for Education in the Generalitat of Catalonia has established the Methodological Digital Competence (MDC) framework for teachers. This model adopts 27 descriptors across five dimensions (Generalitat de Catalunya, 2018) according to the standards established by the Accreditation on Competence in Information and Communication Technologies (ACTIC), for which accreditation can be obtained (Generalitat de Catalunya, 2016). These standards provide essential criteria to measure educator digital competence in relation to digital content development and assessment strategies. In addition, Lázaro & Gisbert’s (2015) frame of reference further reinforces areas in the realm of digital content creation and fostering students’ digital competence that need to be assessed, while Fernández & Pérez’s (2018) model consolidates additional guidelines to appraise assessment strategies. In the UK, Edinburgh Napier University published a practical guide on Pedagogy and Learning Technology (PaLT) (Smyth & Mainka, 2010) and the 3E Framework (Smyth et al., 2011) as a point of reference to promote a shared ethos around the incorporation of digital learning, teaching and assessment across their university. In line with the models published in Spain, both models identify standards that should be acknowledged when evaluating how educators develop digital content and assessment strategies, and foster students’ digital competence.

Benchmarked standards designed to assess the quality of higher education online teaching and learning also provide suitable criteria to support DigCompEdu competences externally. The Technology Enhanced Learning Accreditation Standards (TELAS) are a set of internationally benchmarked standards that can be applied to assess how educators develop digital content and foster students’ digital competence within the tertiary sector (TELAS, 2020). Analogously, the Online Learning Consortium (OLC) is a US collaborative community of higher education leaders and innovators dedicated to advancing quality digital teaching and learning experiences. The intent of the quality framework is to help institutions identify goals and measure progress towards them based on a suite of five quality scorecards (Online Learning Consortium, 2022). These scorecards provide practical guidelines for the development of digital content, fostering students’ digital competence, digital assessment strategies and using open educational resources.

In sum, the literature review presents a synthesis of existing instruments at the national and European levels that can be used in combination with DigComp and DigCompEdu to measure student and educator digital competence beyond self-assessment. Although there is notable consensus on the areas of competence that need to be assessed, studies continue to implement different instruments in a mutually exclusive manner that does not include other measures to provide a deeper or more complete self-assessment. Collectively, however, these instruments provide suitable criteria that can advance our theoretical understanding of the ability of self-perception to measure digital competence, and thus address the current research gap. As there are merits to adopting a combined approach (Volungevičienė et al., 2021), this study examined how the DigComp and DigCompEdu self-assessment frameworks can be supported with other measures to provide a deeper and more complete assessment of student and educator digital competence. Against this background and upon reviewing instruments published to date, this study examines the following research questions:

  • Research Question 1: How can existing instruments in the literature be combined to develop assessment rubrics that support DigComp and DigCompEdu self-assessment data with external evaluation?

  • Research Question 2: Do these rubrics have validity and reliability as demonstrated by the judgement of experts in the field and their subsequent implementation?

3 Method

Against this background and upon reviewing existing instruments, the author of this study developed two rubric-based frameworks to support DigComp and DigCompEdu self-assessment data with external evaluation of task performance. The project encompassed a two-stage mixed method approach involving the development of two assessment rubrics and the validation and implementation of these rubrics. First, the author of the study conducted a systematic review of the scientific literature to canvass validated instruments in the field that could evaluate performance in practice of DigComp and DigCompEdu competences. To develop the broadest possible range of sources relevant to the first research question, a search was conducted on the databases Scopus, Web of Science, and Google Scholar based on the keywords digital competence, digital literacy, ICT/Information and Communication Technology, framework, model, and rubric. A total of 15 frameworks were selected (see Table 1) to develop two rubrics based on their empirical ability to assess DigComp and DigCompEdu self-assessment competences from a task performance perspective. Second, the Delphi technique was used to solicit the input of seven educational experts in the field to validate the internal validity of both instruments. Lastly, both rubrics were implemented at a Spanish university to test the reliability of the items included. Due to the qualitative nature of the assessment rubrics, a small convenience sample of 20 university lecturers and 26 students was used. Ethics approval for the study was obtained from all participants and the participating institution.

4 Results

4.1 RQ1. How can existing instruments in the literature be combined to develop assessment rubrics that support DigComp and DigCompEdu self-assessment data with external evaluation?

In line with the current literature’s recommendations, a combined approach was chosen to support and scaffold critical assessment of digital competence, as both perspectives are equally important and should be taken into consideration. As Schaper points out, a framework grounded on the theory of self-perception acknowledges the pedagogical necessity to make normative decisions, while empirical evidence founded on task performance assures its relevance for practical problems (Schaper 2009: 177). Indeed, the recontextualizing of problems as part of a broader texture of academic experiences, habits, and perceptions, can foster a deeper sense of ourselves and of our practices (Maderick et al., 2016; Shapiro, 2010). To provide a more multifaceted insight into student and educator digital competence, the DigComp and DigCompEdu self-assessment tools were taken as a starting point given their confirmed relevance as self-assessment frameworks and their noteworthy influence on European education policy and research in this field, particularly in Spain (Basilotta-Gómez-Pablos et al., 2022; Carretero et al., 2017). To compare whether students and educators evidenced the same level of digital competence in their work as reported in their self-assessments, criteria from the 15 frameworks selected from the systematic review of literature (Table 1) were combined to develop two rubrics that assessed task performance relative to DigComp and DigCompEdu areas of competence. A student rubric was designed to assess these areas of digital competence based on their written and oral submissions. For educators, the rubric was aimed at evaluating digital competence demonstrated during lectures and in the design of assessment materials. As to the structure of the rubric, this adhered to the premise that a rubric is a tool used in the process of assessing student work that usually includes Popham’s (1997) three essential features: evaluative criteria, quality definitions for those criteria at particular levels, and a scoring strategy with specific indicators. The rubric-based framework for students is outlined first, followed by the rubric-based framework for educators.

4.1.1 Rubric-based framework for students

DigComp offers a self-assessment tool to gauge and improve citizens’ digital competence based on five dimensions and 21 competences (Carretero et al., 2017). To advance this model beyond self-evaluation, an analytic, task-specific rubric (Jonsson & Svingby, 2007) was designed to compare students’ DigComp self-assessment responses with digital competence evidenced in their oral and written academic submissions. Evaluative criteria for the students’ rubric framework were underpinned by three dimensions and six competences from the DigComp self-assessment tool (see Table 2). These dimensions and competences were chosen as they specifically address areas of ability that an external evaluator can assess through students’ oral or written task submissions. The quality definitions for these criteria at particular levels reflected the three self-assessment DigComp levels of user knowledge and practice for each aforementioned competence: level 1 (basic), level 2 (intermediate), and level 3 (advanced). Lastly, a scoring strategy with specific indicators was included for an external assessor to evaluate each competence through students’ oral or written performance, and subsequently compare these scores with students’ DigComp self-assessment scores. A total of 14 scoring indicator items were identified based on criteria adapted from seven scales reviewed in the previous section (see Table 2). Scales were applied according to the suitability of criteria in these models to measure the area of ability defined by each competence descriptor through oral or written evaluation. Several scales were applied to the same item when the same criteria appeared in more than one framework (e.g., item 2).

Table 2 Dimensions and competences included in the rubric framework for students

Each competence was attributed a range of scoring indicator items (referred to as items, henceforth). The cumulative capacity to develop these items to a lesser or greater extent was then applied as a scoring strategy to corroborate DigComp perceived level of competence reported by students. To facilitate analytic scoring, items were designed to measure digital competences from a micro and macro perspective: items 1 to 11 referring to the individual analysis of digital content and information (DCI) included in students’ work; items 12 to 14 relating to the analysis of DCI as a whole in students’ submissions (refer to Appendix 1 for a detailed breakdown).

To corroborate competence 1.1 (‘Browsing, searching and filtering data, information and digital content’), the number of clearly-defined DCI sources was calculated based on the cumulative abstract/title relevance score (ARS) of each DCI source included (item 1). The higher the aggregate ARS score, the greater level of competence was corroborated: up to 1 point = level 1; 2 points = level 2; 3-4 points = level 3. To corroborate competence 1.2 (‘Evaluating data, information and digital content’), items 2 to 9 were graded on a demonstrable scale (i.e., whether the item was present or not). The total cumulative presence of these items was converted to a percentage to corroborate competence: 0-29% = level 1; 30-69% = level 2; 70-100% = level 3. Item 2 included a qualitative descriptive scale that rendered additional information on the variety of DCI used. Including two types of DCI was scored at level 1; between 3 and 4 types of DCI, level 2; and 5 or more types of DCI, level 3. Items 3 and 4 provided more qualitative information on the development and criticality of DCI present. Bloom’s revised taxonomy and ‘semantic waves’ were applied to measure DCIs individually (items 3-4). The aggregate total of higher-order thinking skills activated in students’ work was converted to a percentage to corroborate competence as follows: 0-29% = level 1; 30-69% = level 2; 70-100% = level 3. The aggregate total of ‘semantic wave codes’ was converted to corroborate competence based on the following scoring scale: 1-2 wave codes present = level 1; 3 wave codes = level 2; 4 wave codes = level 3. Two items (10-11) were considered to corroborate competence 2.2 (‘Sharing through digital technologies’). Both items were adapted to assess DCI individually. Items were scored on a demonstrable scale and total aggregate as previously outlined for items 5-9. Items 12-14 corroborated Digital Content and Creation competences 3.1, 3.3, and 3.4. Items were designed to assess DCI comprehensively and were attached a scoring level in line with DigComp levels of competence.

4.1.2 Rubric-based framework for educators

DigCompEdu is a pedagogically-sound framework that describes what it means for educators to be digitally competent. It provides a general reference frame to support the development of educator-specific digital competences in Europe. It offers a self-assessment tool to gauge and improve educators’ digital competence based on seven dimensions and 25 competences (Mora-Cantallops et al., 2022; Redecker, 2017). To advance this framework beyond self-reported data, an analytic, task-specific rubric was developed to compare educators’ DigCompEdu self-assessment responses with actual digital competence demonstrated in lectures and course assessment materials. Evaluative criteria for the educator rubric were underpinned by five dimensions and 10 competences from the DigCompEdu self-assessment tool (see Table 3). These dimensions and competences were chosen as they specifically target areas of ability an external evaluator can assess through teaching practice in lectures and the design of course assessment materials. The quality definitions for these criteria at particular levels reflected the six self-assessment DigCompEdu levels of user knowledge and practice for each aforementioned competence: A1 (Newcomer), A2 (Explorer), B1 (Integrator), B2 (Expert), C1 (Leader), and C2 (Pioneer). Lastly, a scoring strategy with specific indicators was included for an external assessor to evaluate digital competence based on demonstrated performance in lectures and the design of assessment materials, and subsequently compare these scores with DigCompEdu self-perceived levels of digital competence reported by educators. A total of 20 scoring indicator items were identified based on criteria adapted from 15 scales reviewed in the previous section (Table 3). As previously, scales were applied according to the suitability of criteria in these frameworks to measure the area of ability defined by each competence descriptor through lectures and assessment materials. Several scales were applied to the same item when the same criteria appeared in more than one framework. When scales could not be furnished from the literature, items were developed based on DigCompEdu criteria alone (e.g., item 17).

Table 3 Dimensions and competences included in the rubric framework for educators

Each competence was attributed a range of items to evaluate digital competence demonstrated in lectures and the design of assessment materials. The cumulative capacity to develop these items to a lesser or greater extent was then applied as a scoring strategy to corroborate level of competence. To facilitate analytic scoring, items were designed to measure digital competences from a micro and macro perspective. Items 1 to 9 were adapted to individually assess digital content and information (DCI) used in lecture content by educators, whereas items 10 to 20 evaluated DCI as a whole in lecture content and assessment materials (refer to Appendix 2 for a detailed breakdown).

Twelve items were included to evaluate competences related to Digital Resources. To corroborate competence 2.1 (‘Selecting digital resources’), the number of clearly-defined items of DCI was calculated based on the cumulative relevance score of each DCI presented in lecture content (item 1). The higher the aggregate average relevance score, the greater level of competence was corroborated: under 1 point = level A2; 1 to 1.9 points = B1; 2 to 2.9 points = B2; 3 to 3.9 points = C1; 4 points = C2. To corroborate competence 2.2 (‘Creating and modifying digital resources’), items 5 to 9 were graded on a demonstrable scale. The total aggregate presence of these items was converted to a percentage to corroborate competence: 0-16% = level A1; 17-33% = A2; 34-50% = B1; 51-67% = B2; 68-84% = C1; 85-100% = C2. Item 2 included a qualitative descriptive scale that rendered additional information on the variety of DCI used by educators in their lecture content. Including two types of DCI was scored at level A2 (33%); 3 types at level B1 (50%); 4 types at level B2 (67%); 5 types at level C1 (84%); 6 types or more at level C2 (100%). Items 3 and 4 provided more qualitative information on the degree of interaction and criticality of each DCI used. The aggregate percentage of higher-order thinking skills activated in lecture content was converted to a percentage to corroborate competence as follows: 0-16% = level A1; 17-33% = A2; 34-50% = B1; 51-67% = B2; 68-84% = C1; 85-100% = C2. The aggregate total of ‘semantic wave codes’ was converted to corroborate competence as follows: 1-2 wave codes present = level A2; 3 wave codes present = level B2; 4 wave codes present = level C2. Items 10 to 12 evaluated DCI comprehensively in lecture content in relation to the ease and intuitiveness of navigation, design and layout, and whether digital content was well-written. The following level of competence was attached to the cumulative percentage of item points present in lectures: 0-16% = level A1; 17-33% = A2; 34-50% = B1; 51-67% = B2; 68-84% = C1; 85-100% = C2.

To corroborate competences related to Teaching and Learning, Assessment, Facilitating Learners’ Digital Competence, and Open Education, eight items were designed to evaluate lecture content and/or assessment materials from a macro perspective using a demonstrable scale. Item 13 was attached a scoring level in line with DigCompEdu levels of competence. Items 14 to 20 were corroborated based on the cumulative total of points present. Items that included three demonstrable points were scored as follows: 0 to 1 = level A2; 2 = B2; 3 = C2. Items that included six demonstrable points were attributed the following level of competence: 0-1 item = A1; 2 items = A2; 3 items = B1; 4 items = B2; 5 items = C1; 6 items = C2.

4.2 RQ2. Do these rubrics have validity and reliability as demonstrated by the judgement of experts in the field and their subsequent implementation?

As in previous studies (Merma-Molina et al., 2017), the expert technique, which is the base for the Delphi Method, was used to determine the reliability of the rubric. The Delphi method was developed at the RAND Corporation in the late 1950s and 1960s as an effective means for collecting and synthesizing expert judgments (Gordon & Pease, 2006). The objective of the Delphi methodology (Dalkey & Helmer, 1963) is to obtain the most reliable consensus of opinion of a group of experts who analyse a problem or an instrument. Participants are carefully chosen for their expertise in some aspect of the issue under study and are promised anonymity with respect to their answers. However, one of its main drawbacks is the iterative nature of the feedback collected over several rounds, which can lengthen substantially the time it takes to complete a study. To improve the speed of the process, Gordon & Pease (2006) developed the real-time (RT) Delphi which reduces these multiple reiterations to one round in synchronous studies with a small number of experts. In this study, seven qualified experts (known to the author) were selected based on discipline, years of teaching and assessment experience using digital technologies in higher education (i.e., more than 5 years), and geographical context (i.e., that represented the geographical context of this study and beyond). All experts stemmed from the same discipline as that of the university lecturer and student being evaluated in the validation process (i.e., Arts & Humanities: Linguistics). Five specialists had over 10 years’ teaching and assessment experience using digital tools in higher education contexts; 2 experts had over 5 years’ experience. Experts ranged from Spanish (2), UK (2), and USA (3) universities. Each participant was asked to individually evaluate the same written task using the rubric-based framework for students, and the same lecture content and course assessment materials using the rubric-based framework for educators. The evaluation process was conducted in situ online via a facilitator who collected each expert’s individual responses and their justification. Feedback was also obtained from each expert at the end of the evaluation process.

To statistically verify the obtained results, a Fleiss’ kappa test was conducted. This test measures the percentage of agreement between the expert raters. It is considered that a value of Fleiss’ kappa test equal to or greater than 0.7 represents a reliable instrument. The level of agreement between the seven experts using Fleiss’ kappa revealed high levels of inter-coder reliability (rubric-based framework for students κ=0.864; rubric-based framework for educators (κ=0.806). In addition, an F test for equal variances also revealed high levels of intraclass correlation coefficients for both the student rubric (F(14,89.2) = 18.5, p <0.001) and educator rubric (F(36,216) = 231, p <0.001), confirming internal consistency among rater variance. As in previous studies (Merma-Molina et al., 2017), these results indicate that the force of agreement across raters is consistent and therefore both instruments are reliable.

Comments garnered from experts as to the evaluation process were mostly positive and supported the implementation of both rubrics. Specialists agreed that the items included in both rubrics were relevant and thought-provoking. They felt that the entire process prompted them to reflect on their own teaching and learning practice, particularly in relation to the importance of clarity when designing a rubric, the nature and relevance of evaluation, and the time needed to present information in lectures and course assessment effectively. Feedback included the following remarks:

The teacher and student rubrics have made me reflect on the nature of evaluation and what’s important about it. Clarity is a huge issue. It’s also made me think about how much time is needed to present information, especially when conveying assessment materials to my students.’ (Expert 3, USA).

Accessibility is becoming more important and there needs to be a contingency plan for when things don't work or students can't do them. The rubric evaluation items include a number of areas that I need to start integrating into my own teaching practice.’ (Expert 7, UK)

Definitely prompted some food for thought. What is the purpose of essay writing and evaluation in general? Undoubtedly, the rubric criteria bring these questions to the fore and shed some light on what the goals and purpose of writing and teaching should be.’ (Expert 1, Spain)

One expert underscored that they would have preferred to complete the rubrics asynchronously, at their own pace. However, to ensure that all experts evaluated tasks under the same conditions, this was not possible for quality assurance purposes.

The video needs to be watched a couple of times to answer some of the rubric questions. Getting the documents in advance and having more time to look at the materials and video beforehand would have been helpful. It's hard to take in the information only watching the video once.’ (Expert 6, USA)

Another expert also commented on the length of time it took to complete each rubric. Although the RT Delphi method reduced the validation process to one round of feedback, the qualitative nature of the items included to assess each area of ability made the process cumbersome at times for this participant. Once again, it was suggested that this aspect could be improved by not having to complete the rubrics in one go but asynchronously at one’s own pace.

‘I started flagging at some points when items required quite a bit of attention to complete, not to mention that it would have been useful to have some time to reflect on my answers and maybe review them again at a later stage. This was particularly noticeable when I was evaluating the teacher. I guess you’re always more susceptible when it comes to evaluating peers.’ (Expert 4, UK)

Upon validation of both rubrics, the instruments were implemented at a Spanish university to further test the reliability of each instrument and to examine the rubric items more closely and allow the resolution of any potential difficulties. Due to the qualitative nature of the assessment criteria included in the rubrics, a small convenience sample of 26 students and 20 university lecturers was considered sufficient for assessing digital competence at university, as in previous studies (e.g., Hobart et al., 2012). Both lecturers and students were selected to represent a range of undergraduate disciplines at the designated institution (Arts & Humanities, Science, Social & Legal Sciences, and Architecture & Engineering). The author of this study first applied the student rubric to assess demonstrated digital competence through students’ final year undergraduate written projects. After which, the educator rubric was used to evaluate lecturers’ digital competence based on digital presentations used in lectures and the design of assessment materials included in the undergraduate modules of those lectures. Given that some areas of competence were measured based on the aggregate score of several items, an exploratory factor analysis was conducted to examine the factor structure of the items per competence for each rubric. For the student rubric, the Cronbach alpha correlation coefficient of the total scale was .70 and the alpha correlation coefficients of the 6 sub-scales of digital competence ranged from .58 to .72. The results indicating that the student rubric has a high degree of internal consistency. For the educator rubric, the Cronbach alpha correlation coefficient of the total scale was .85 and the alpha correlation coefficients of the 10 sub-scales of digital competence ranged from .81 to .86. The results indicating that the educator rubric has a high degree of internal consistency (refer to Table 4).

Table 4 Reliability of the student and educator rubrics

5 Discussion

With a view to exploring how existing instruments in the scientific literature can be combined to develop assessment rubrics that support DigComp and DigCompEdu self-assessment data with external evaluation, the findings demonstrate two key findings. First, and in line with Volungevičienė et al.’s (2021) results, adopting a dialogical, “pick and mix” approach served to support and scaffold the DigComp and DigCompEdu self-assessments tools in more depth with actual task performance. By using a combination of 15 existing empirical models, it was possible to externally assess a broad range of DigComp and DigCompEdu competences based on students’ written submissions and university educators’ lecture content and assessment materials. Second, the fact that the panel of experts, students and university lecturers all reported a high level of agreement and consistency in terms of the rubric’s validity and reliability further supports the combined approach advocated in the literature (e.g., Hew et al., 2019; Peters et al., 2021; Spante et al., 2018). Not only that, since the panel of experts ranged from different tertiary contexts worldwide, this also means that the suitability of these instruments could extend beyond EU higher educational contexts. In response to some of the observations made by the expert panel, however, it would be convenient to test whether conducting the RT Delphi method asynchronously has an impact on the internal validity of the rubrics.

6 Conclusion

This study develops and validates two rubric-based frameworks to address the drawbacks of data collection that measures digital competence based on self-assessment alone, identified as a research gap in this study’s review of the literature. The idea being to provide researchers and educators with two instruments that attempt to corroborate perceived level of competence through actual task performance evidenced in teaching and learning. The fact that evidence suggests that gaps have been shown to occur between self-perceived competence and performance in practice in the field of digital competence (Maderick et al., 2016) means that, future education policy needs to consider whether the recommendations offered in studies of self-perception alone align with those supported by demonstrated practice. The results of this study confirm that the two rubrics developed by the author are reliable as internal consistency proved to be strong both across external raters and rubric items. Thus, these rubrics can be implemented in practice within higher education contexts to measure digital competence based on task performance in comparison to how students and educators self-assess these competences using the DigComp and DigCompEdu tools. By challenging and expanding the explanatory ability of self-perception, this study contributes to theory advancement with two instruments that can provide new evidence upon which to determine empirical education policy in this field. In other words, both rubrics can serve to corroborate the competences that 21st century students and educators need to develop in order to improve their educational practice comparing self-perception with task performance. As to the limitations of this study, the fact that a small sample was used to test the internal consistency of items included in both rubrics may prevent the findings from being extrapolated. Future studies should be conducted with a larger sample to further validate the reliability of both rubrics.