1 Introduction

We live in an age of information explosion. Knowledge is available in a variety of forms, with the ease of a click-of-a-button or a voice-activated digital agent, almost anytime, anywhere. This abundance of information has driven a search for new modes of meaningful engagement with information, exploring new ways to access, assess and use data. In recent years it has become almost impossible to ignore 'buzz' words, such as ‘Big Data’, ‘Artificial Intelligence (AI)’, ‘Machine Learning’ and ‘Data Science’—all manifestations of humanity’s efforts to deal with the information overload.

Into this ‘sea of knowledge’, Wikidata, a "free and open knowledge base that can be read and edited by both humans and machines"(Wikidata.org), was launched in 2012. Wikidata serves as a central, multilingual and free storage for structured, linked data, which is drawn from Wikipedia, its sister projects, and from other external sources. Wikidata has been growing exponentially and is maintained by a community of over 25,000 editors (https://w.wiki/PaP). On October 2022, Wikidata celebrated its 10th anniversary, crossing the 100 million items threshold, and to date, it is the largest Semantic knowledge-base in existence. As such, it seems Wikidata holds "many exciting possibilities" (Erxleben et al., 2014), and opens the door for a variety of new research opportunities and "potential applications across all areas of sciences, technology and cultures" (Vrandečić & Krötzsch, 2014).

But what potential does Wikidata holds for its users, especially educators and researchers? How can it be used as a life-long learning platform? And what are the benefits and challenges of engaging with this platform? This exploratory research investigates the phenomenon of Wikidata by analyzing multiple use case studies, focusing specifically on its value in educational contexts. For the initial purpose of investigating Wikidata’s value for Education, seven semi-structured, in-depth interviews were conducted with early adopters of the platform, out of which 10 distinct case studies were extracted. A qualitative analysis approach of multiple case studies was used to explore, code and categorize the projects. A thematic analysis of patterns was used to extract main uses, benefits and challenges. The case studies, uses, benefits and challenges are described and discussed herein.

2 Background

2.1 From Web 1.0 to Web 3.0: The emergence of the semantic web

While Web 1.0 used static HTML pages that users could only consume, Web 2.0 allowed users to create and share information, such as social networks and blogs. In recent years we have witnessed another evolution of the web, the emergence of Web 3.0,Footnote 1 which among other things includes the Semantic Web, or Linked Data. In 1980, “semantic webbing” was described as organizing information and relationships by visually displaying them (Freedman & Reynolds, 1980). Burners-Lee presented the idea of using typed links as a semantics tool, calling it “Semantic Web” (Guns, 2013). After describing its roadmap in 1998, he introduced the modern idea of Semantic Web in 2001 (Berners-Lee et al., 2001). Bizer, Heath and Berners-Lee explain that "Linked Data realizes the vision of evolving the Web into a global data common, allowing applications to operate on top of an unbounded set of data sources, via standardized access mechanisms" (Bizer et al., 2009). According to this vision, "the traditional Web… should be extended to a Web of Data where not only documents and links between documents exist, but [also links among] any entity and any relation" (Färber et al., 2015), in such a way that "machines would be able to participate and help" humans.Footnote 2

This new technology focuses on data rather than on applications and is known as Web 3.0 (Hendler, 2009). In the Social Web (Web 2.0), value is created by aggregates information and knowledge from users and online communities. In the Semantic Web, however, value is created by integrating structured data from many sources (Gruber, 2008) and meaningfully connecting pieces of information. At the heart of the Semantic Web is structured, linked data, that "makes substantial reuse of existing ontologies and data" (Shadbolt et al., 2006). This new web theoretically allows both humans and machines to harness the power of a high quality, up-to-date and well-referenced knowledge base of linked data. "Linked Data principles and practices have been adopted by an increasing number of data providers, resulting in the creation of a global data space on the Web containing billions of RDF triples" (Hernández et al., 2016). This brings us closer to fulfilling the prediction that "Linked Data will enable a significant evolutionary step in leading the Web to its full potential" (Bizer et al., 2009).

2.2 Semantic networks as learning platforms

The Semantic Web offers structured linked data that both humans and machines can tap into as a resource. This also means it could be used for learning, not only in the classroom, but as a ‘lifelong learning’, which has a variety of definitions documented in academic literature (Aspin & Chapman, 2010; Collins, 2004; Dabbagh & Castaneda, 2020; Field, 2000; Laal, 2012; Laal & Salamati, 2012). In this paper, we refer to it as an ongoing, self-motivated pursuit of knowledge, skills, literacies or competencies along one’s life, whether for professional or personal reasons. As early as 2003 educators started exploring the Semantic Web relating to education, e-learning and lifelong learning (Anderson & Whitelock, 2004; Koper, 2004; Naeve et al., 2006). That research focused mainly on using the technology to advance education, rather than exploring the types of learning and uses the technology enabled. That is mainly since the Semantic Web vision has not been fully realized on a scale that would allow such exploration. It took a decade longer till Wikidata was launched, marking an important milestone in realizing the Semantic Web dream on a scale, and a few additional years to mature to a point in which educators began experimenting with it in classroom, examining its potential as a learning platform. Meanwhile, Web 2.0 platforms have matured, driving a profusion of academic research related to various applications, among them Wikipedia, as learning platforms and pedagogical frameworks that support learning (Evenstein Sigalov & Nachmias, 2017). These experiences with Web 2.0, and specifically with Wikipedia, would later affect the engagement with Semantic platforms and the use of Wikidata in educational and research contexts, leading to its exploration as a learning platform. Let us therefore explore some relevant theories and frameworks for Web 2.0 platforms that may be relevant to the Semantic Web.

2.3 Web 2.0 theories, frameworks & literacies relevant for the semantic web

There is a wealth of literature exists on Web 2.0 applications as learning platforms, specifically engaging with Wikipedia in educational context. Conversely, almost none deals with Web 3.0 and Semantic Web as learning platforms. While awaiting new research and innovative pedagogical frameworks to emerge, three Web 2.0-related educational theories, frameworks or paradigms (specifically for Wikipedia as a learning platform), have been identified as relevant for the Semantic Web.

The first is Constructivism. Rooted in the works of Dewey, Mead and Piaget, its paradigm "describes how learning happens" (Parker & Chao, 2007). Knowledge and meaning are "constructed rather than given" (Parker & Chao, 2007), through a "discussion with peers and teachers, and through reflection" (Higgs & McCarthy, 2005). “The focus on real, authentic problems… force[s] learners to… develop capacity for effective problem-solving behaviours” (Anderson, 2016). Such learning should be "cooperative, collaborative, and conversational, providing students with opportunities to interact…, to clarify and share ideas, to seek assistance, to negotiate problems, and discuss solutions” (Boulos et al., 2006). As Anderson puts it, “multiple perspectives and sustained dialogue lead to effective learning” (Anderson, 2016). To sum up, engaging with a community of learners allows learners to sharpen or gain new skills, and motivates them to attach meaning to what is learned, which results in construction of new knowledge that is better retained.

The second framework is collaborative learning. Per Wheeler et al., engaging deeply with "learning objects" and web-based discussion, communities bring forth significant benefits for the "development of professional practice (Boulos et al., 2006; Wheeler et al., 2005). Parker & Chao, as well as Boulos, Maramba and Wheeler, asserted that using a technological collaborative platform encourages a deeper engagement with learning materials. Collaborative learning, then, leads to positive interdependence of group members, individual accountability, and appropriate use of collaborative skills (Parker & Chao, 2007; Schaffert et al., 2006). Collaborative learning also stimulates higher levels of thought and cognitive work, and longer information retention. (Galway et al., 2014; Johnson & Johnson, 1986; Parker & Chao, 2007; Schaffert et al., 2006). To conclude, while engaging with a community, users develop collaborative learning skills and knowledge, while highlighting the importance of the technology as a platform that enables learning.

The third framework, Self-directed learning, and more recently, Heutagogy, was developed by Hase and Kenyon in 2000, and named after the Greek word for “self”. With strong roots in self-directed learning, Heutagogy shifts the focus and control from the teacher to the learner (Anderson, 2016). The educational focus shifts from instructing and testing competencies, towards learning in new and unfamiliar contexts, as a life-long process (Blaschke, 2021; Hase & Blaschke, 2021; Moore, 2020). As Hase and Kenyon put it, “heutagogy looks to the future in which knowing how to learn will be a fundamental skill given the pace of innovation and the changing structure of communities and workplaces” (Hase & Kenyon, 2000). “Heutagogy thus emphasizes self-direction and focuses on the development of efficacy in utilizing the online tools and information available” (Anderson, 2016). While the focus on learners is positive, this shift puts pressure on the learner. As Kop and Hill explain, in order to succeed, the learner needs to be not only capable, but also highly motivated to engage in a self-directed learning (Kop & Hill, 2008). The question of motivation and skills is therefore highlighted as a significant influencer on a successful learning process.

These three educational theories and frameworks have been used by educators to promote acquisition of not only knowledge, but also skills and literacies required for lifelong learning in the digital age, including Digital Literacy (Pangrazio et al., 2020; Reddy et al., 1 C.E., n.d.; Spante et al., 2018) and Data Literacy (Gummer & Mandinach, 2015; Koltay, 2015; Mandinach & Gummer, 2013; Mandinach et al., 2015; Schield, 2004; Stephenson & Caravello, 2007; Wang et al., 2019). Both terms have multiple definitions in literature, and it is outside the scope of this paper to fully explore them. For Digital Literacy we have relied on a systematic review of the term by Spante et al. (2018). While the original definition by Gilster (1997) was “the ability to understand and use information in multiple formats from a wide range of sources when it is presented via computers”, the term has evolved over time (Spante et al., 2018). They note that later researchers suggest that the term “originates in a skill-based understanding of the concept and thus relates to the functional use of technology and skills adaptation”; and that, “definitions of digital literacy point towards cognitive skills and competences” (Spante et al., 2018). For Chan et al. (2017), it is “the ability to understand and use information in multiple formats with emphasis on critical thinking rather than information and communication technology skills”. They also note that at times, the term is used in the plural, “digital literacies”, which “acknowledges new and diverse social practices” and “emphasizes the non-generic and multiply situated nature of the term” (Spante et al., 2018). Finally, they note that some researchers expand the definition to “new textual landscape”, including social media and social practices (Spante et al., 2018). Such definitions include Martine’s definition (2006), also used by Tang & Chaw (2016), and adopted as well for this research, as”the awareness, attitude and ability of individuals to appropriately use digital tools and facilities to identify, access, manage, integrate, evaluate, analyze and synthesize digital resources, construct new knowledge, create media expressions, and communicate with others, in the context of specific life situations, in order to enable constructive social action; and to reflect upon this process” (Spante et al., 2018).

As for Data Literacy, some basic definitions would refer to the ability to “understand, use and manage data” (Qin & D’Ignazio, 2010), or “the ability to understand and use data effectively to inform decisions” (Mandinach & Gummer, 2013). In a world of Information Explosion, Big Data, AI and Machine Learning, it is essential to assist learners in develo** critical thinking related not only to digital, online spaces, but more specifically, to data, as the backbone of digital environments. The educational theories and frameworks, as well as the literacies they promote, seem to be relevant not only to Web 2.0, but also to Web 3.0 and Semantic platforms such as Wikidata, as will be demonstrated later. Before delving into Wikidata, we will therefore examine what engaging with Wikipedia has taught us and how it has informed the experimentation with Wikidata later on.

2.4 Wikipedia as learning platforms

In the last decade, a growing number of educators have been using Wikipedia and integrating it into the curricula (Aibar et al., 2015; Dooley, 2010; Evenstein Sigalov & Nachmias, 2017). Initially, Wikipedia was used to teach better information consumption skills and then started to be utilized as a platform for collaborative knowledge construction. But what can research reveal about its benefits as a teaching and learning platform? Wikipedia strives for quality, up-to-date, neutral and well-referenced articles, and offers unique educational opportunities for both teachers and learners (Evenstein Sigalov & Nachmias, 2017; Herbert et al., 2015; Konieczny, 2007, 2016). As a Web 2.0 platform that allows users not only to consume information, but also to create and share knowledge, Wikipedia’s pedagogical potential has long been investigated. Educators and researchers have focused on its ability to actively and collaboratively involve learners in the construction of knowledge (Aibar et al., 2013, 2015; Boulos et al., 2006; Evenstein Sigalov & Nachmias, 2017; Konieczny, 2016; LaFrance & Calhoun, 2012; Mareca & Bordel, 2019; Mendes et al., 2021; Minguillón et al., 2017; Mendes et al., 2021). This new model led to for-credit, semester-long, elective courses, in which adding content to Wikipedia has been used as a main assessment model.

2.5 The case of Wikidata

With deep roots in the Semantic Web community, Wikidata came to existence in 2012 when several Wikimedians, including Dr. Denny Vrandečić, tried to answer a question that a Google search failed to accurately address: "What are the 10 largest cities with a female mayor?" (Erxleben et al., 2014; Krötzsch et al., 2007).Vrandečić felt that free and open knowledge should include data that can be searched, analyzed and reused, and as a response developed Wikidata (Vrandečić & Krötzsch, 2014). Wikidata provides a rich, free and multilingual dataset that is constantly improved by users and machines (detailed explanation of Wikidata can be found in Appendix 1). Vrandečić’s statement that Wikidata has exceeded his expectations, may be explained in the new learning opportunities it offers its users, as will be explored hereafter. As far as we know, the first experimentations with Wikidata in academic settings began in 2015. Like in the case of Wikipedia, we first encountered it as an exploratory, informative addition to other courses, then as an alternative assessment in various courses; and in 2018, for the first time, Wikidata became a main assessment in an Academic course. Additionally, outside Academia, researchers, industries, cultural and governmental institutions began experimenting with Wikidata, resulting in various types of learning beyond the classroom.

2.6 Types of engagement with Wikidata

While multiple methods of interacting with Wikidata may induce learning and acquisition of knowledge and skills, two main user interactions and their learning opportunities were considered for this research—data curation, the process of adding information to Wikidata; and data extraction, the process of querying and extracting information from Wikidata. Data curation is performed via four main methods:

  1. 1)

    direct, manual edits directly into Wikidata’s interface.

  2. 2)

    the Wikidata Game (https://tools.wmflabs.org/wikidata-game/) and Distributed Games (https://tools.wmflabs.org/wikidata-game/distributed/)—both allow micro-contributions to Wikidata by playing simple games.

  3. 3)

    Quick Statements, a tool that allows users to add multiple statements to multiple items (https://tools.wmflabs.org/wikidata-todo/quick_statements.php).

  4. 4)

    Mass-uploads of metadata donations from external institutions via dedicated tools or bots.

Data extraction is achieved via three main methods:

  1. 1)

    querying through a service, such as the built-in one (https://query.wikidata.org/). Most services require knowledge of a special coding language called SPARQL (https://w.wiki/N3J), while others do not, such as VizQuery(http://tools.wmflabs.org/hay/vizquery/), and Platypus (https://askplatyp.us/; https://blog.wikimedia.de/2015/02/23/platypus-a-speaking-interface-for-wikidata/).

  2. 2)

    Third party applications that explore data and visualize the results (https://w.wiki/PaQ) (three examples are available in Appendix 2).

  3. 3)

    Data extraction via Wikidata API (Malyshev et al., 2018), an advanced method that will not be discussed as it is outside the scope of this paper.

2.7 Wikidata users and early adopters

Early adopters of Wikidata include Wikimedians (https://w.wiki/PaP); industries that use the database to offer various services; institutions, that also donate their metadata (Kapsalis, 2019; Klein & Kyrios, 2013; Snyder et al., 2020; Tharani, 2021); and Researchers experimenting and conducting various types of research with the platform (Farda-Sarbas & Müller-Birn, 2019; Heftberger et al., 2020; Hernández et al., 2015; Lemus-Rojas & Lee, 2019; Steiner, 2014).

In the past decade the research community has shared various types of research papers dealing with Wikidata, which could be categorized in a variety of ways. While reviewing the literature, it was found that the majority of existing research focuses on either technological aspects or ontological aspects of the platform. More Specifically, review of the literature illustrated that researchers use Wikidata to conduct new types of research (Amaral et al., 2021; Colla et al., 2021; Ferradji & Benchikha, 2021; Good et al., 2016; Kaffee, 2016; Konieczny & Klein, 2018; Lemus-Rojas & Odell, 2018; Li et al., 2022; Meier, 2022; Mietchen et al., 2015; Morshed, 2021; Neelam et al., 2022; Rasberry & Mietchen, 2021; Shenoy et al., 2022; Taveekarn et al., 2019; Waagmeester et al., 2020, 2021; Zhang et al., 2022). Researchers also use Wikidata to conduct new types of academic analysis in a variety of disciplines (Arnaout et al., 2021; Burgstaller-Muehlbacher et al., 2016; Kaffee et al., 2017; Klein et al., 2016; Lemus-Rojas, n.d.; Pfundner et al., 2015; Putman et al., 2017; Rutz et al., 2021; Scharpf et al., 2021a, b; Turki et al., 2019, 2022a, b). Finally, at times researchers use Wikidata to demonstrate new types of visualizations (Hernández et al., 2016; Metilli et al., 2019; Nielsen et al., 2017; Nielsen, 2016a, b).

In a meta-review study conducted by Farda-Sarbas & Müller-Birn in 2019, 67 peer-reviewed articles from journal and conference proceedings were classified and categorized (Farda-Sarbas & Müller-Birn, 2019). The researchers divided existing academic research on Wikidata into 5 main categories: 1) Data Oriented Research, including “data quality issues”, and “tools & datasets” (22 articles in total); 2) Knowledge Graph Oriented Research, including “comparison of knowledge graphs”, “common issues of knowledge graphs”, and “Wikidata as linked data provider” (15 articles); 3) Community-oriented Research, including “design decisions”, “WD community”, and “multilingualism” (14 articles); 4) Engineering-oriented Research, including “enhancement features and vandalism detection” 9 articles); and 5) Application Use Cases, including “medical & biological data”, and “linguistics” (7 articles). In their conclusion, the researchers explain that while Wikipedia has been studied in a variety of disciplines, this is not the case with Wikidata, despite the platform having “the competence to be used in different disciplines” (Farda-Sarbas & Müller-Birn, 2019). They recommend that further investigations must take place “to find out whether Wikidata can be beneficial in the same areas where Wikipedia was used” (Farda-Sarbas & Müller-Birn, 2019). As they note, while their analysis revealed usage of Wikidata in various contexts, the use cases “come from the biomedical domain and linguistics mainly” (Farda-Sarbas & Müller-Birn, 2019). They conclude by suggesting, “It might be valuable to see more use cases from other disciplines, such as social sciences or humanities. It might be valuable, for example, to use Wikidata in educational or museum settings” (Farda-Sarbas & Müller-Birn, 2019). Similar conclusions are to be found in another systematic review of the Wikidata-related literature conducted by Mora-Cantallops et al. (Mora-Cantallops et al., 2019). It appears, then, that the promise that Wikidata holds for education and research is yet to be fully explored and examined. This potential exploration includes what could be learned from existing interactions with the platform, practical uses of the platform, and the benefits and challenges users experience throughout various interactions.

3 The study

3.1 Research goals

It appears that humanity is just beginning to explore the potential of Semantic Web platforms, and more specifically, Wikidata’s potential for education and research. As Müller-Birn et al. found, "Peer-production communities addressing the development of structured data have not as yet attracted much attention from the research community" (Müller-Birn et al., 2015). For this reason, questions relating to different processes of interactions with the platform from a user's learning perspective remained unexplored by academic research. As Wikidata is still relatively young, the results of its continued progress are complex to divine. However, given its close connection to Wikipedia it is apparent that exciting possibilities of both understanding how to contribute to it and how to utilize its data "remain to be explored" (Erxleben et al., 2014). As Müller-Birn and his collaborators put it, "Wikidata provides the prototype of a system that allows even non-technical experts to create and manage semantic data", with the potential to be "the nucleus for a completely new type of system" (Müller-Birn et al., 2015). Erxleben and his collaborators conclude that "It remains for the community of researchers and practitioners in semantic technologies and linked data to show the added value Wikidata can bring about” (Erxleben et al., 2014).

Considering Wikidata’s potential, the main purpose of this paper is to investigate its value for education and research, a topic yet to be properly covered by academic research -as noted in the systematic review of existing research (Farda-Sarbas & Müller-Birn, 2019; Mora-Cantallops et al., 2019). More specifically, this paper aims to inform educators and researchers about new learning opportunities enabled via this semantic platform by examining early adopter projects in educational, research and cultural institutions, shed light on the main aspects that make Wikidata valuable to all disciplines, and demonstrate its power as a potential learning platform in diverse contexts.

3.2 Research Questions

Bearing in mind the research goals, the main research questions are:

  1. 1)

    What are some of the distinct projects using Wikidata?

  2. 2)

    Considering these projects, what are the main uses of Wikidata that induce learning in the context of education or research?

  3. 3)

    Based on the projects and uses, what are the main benefits and challenges when using Wikidata in the context of learning and engaging with data?

4 Methodology

4.1 Research design & strategy

We investigate Wikidata’s value for its users via multiple projects’ case studies. This methodology requires an in-depth examination that draws on multiple sources for information (Creswell, 1998). However, the Semantic Web, and specifically Wikidata, is a relatively new phenomenon, which has not yet been explored in the context of learning (Farda-Sarbas & Müller-Birn, 2019; Mora-Cantallops et al., 2019). Sources of information are still hard to find, and literature on the topic is almost non-existent. To better understand the relatively new phenomenon of utilizing Wikidata for learning, the study, approved by the university’s Ethics Committee, engaged the international community of early adopters and for this article included seven semi-structured, in-depth user interviews that had four goals: 1) Document the different projects and interactions with Wikidata; 2) Gain a deeper understanding of the different uses of Wikidata based on these projects; 3) explore the benefits and challenges using the platform; and 4) document workflows, with an emphasis of identifying specific features or characteristics of Wikidata that promote, induce or result in learning.

4.2 Participants

When reaching out to the global Wikidata community, we sought participants that could share “success stories”, with new, unique or groundbreaking projects involving Wikidata. We strove for diversity, particularly in four main aspects of projects: 1) geographic location and languages used – attempting to go beyond English-centric examples; 2) discipline / type of institution – striving to include examples from a variety of institution type (educational / cultural / governmental / research / industry); 3) types of interactions, attempting to describe different types of interactions, whether data curation, data extraction, or both; and 4) types of uses, looking for a cohort in which different projects reveal different aspects and possible uses of Wikidata.

Six participants were affiliated with educational and research institutions. Four were either affiliated with or worked with cultural institutions (GLAMs). The Participants came from England, Scotland, USA, Israel, Brazil, Germany and Australia. Native languages included English (4), Portuguese/French (1), German (1), and Hebrew (1). Only one participant was female, reflecting a known gender gap in Wikimedia projects (Ford & Wajcman, 2017; Hargittai & Shaw, 2015; Klein et al., 2016; Wagner et al., 2015). While most participants’ data will remain anonymous, some information is shared—either because it was already public or by explicit consent.

4.3 Data collection

As three types of institutions were targeted (education, research and culture), three interview protocols were developed, which share key questions with appropriate adaptations.

Interviews were conducted online between January 2019 and June 2021, via platforms such as “Hangout on Air” and “StreamYard”. Interviews lasted 60–180 min, with most averaging 90–120 min.

4.4 Data analysis

Interviews were transcribed and thematically coded through the “Dedoose” software, and then analyzed. The coding & analysis included an iterative process – enabling the researchers to reflect on the themes, categories and data collected. Next, similar codes were converged and categorized to reach a final category tree. Since one interview focused on a project directed by one of the authors, to avoid a conflict of interest and strive for neutrality, this project was only included in the descriptive response to the first research question and excluded thereafter. All other interviews were mapped using a bottom-up thematic analysis, followed by quantitative comparisons. First, each statement was coded (coding was not exclusive so statements could be attributed to several categories). Then, an iterative process was used to group similar codes and refine the category tree. To ensure inter-rater reliability of the coding, 30% of the statements were additionally analyzed by a second coder. Agreement level was high, Cohen's Kappa = 0.94. The data collected was classified into categories, sub-categories, and at times, sub-sub-categories. By analyzing these statistically, insights were derived regarding the uses, benefits and challenges of Wikidata as a learning platform. Specific characteristics of interaction with Wikidata that induce learning were highlighted and discussed, as well as implications for education from a wider perspective of life-long learning.

5 Findings

The seven interviews depict ten distinct case-studies or projects, that demonstrate learning opportunities in a variety of disciplines, contexts, locations and institutions. The projects are first presented in context of their value for educators, researchers and learners, and then main uses, benefits and challenges of interacting with Wikidata are presented.

5.1 The projects

5.1.1 Bodleian libraries, university of Oxford, UK: The astrolabe explorer

In 2015, the Bodleian Libraries hired a “Wikimedian-in-Residence, longtime Wikimedian, Dr. Martin Poulter. Poulter was to “undertake academic and public outreach work to encourage understanding and development of Wikimedia projects and improve access to the libraries’ collections”.Footnote 3 For over 4 years Poulter focused on making the libraries’ collections more visible in Wikimedia projects; exposing academics, students and the public to the benefits of working with Wikimedia projects; and proactively assisting in closing the gender gap. As Poulter explained, in addition to Wikipedia he increasingly focused on Wikidata, and on how to “tell compelling stories with data”. One example is a collection of antique Astrolabes, historic devices used for navigation. To make this collection available to the public in an efficient, engaging, and interactive way, Poulter imported all astrolabes data into Wikidata, then over a lunch break, he created a website, as shown in Figure 1. Each tab showcased automatically generated content from Wikidata, focusing on different aspects of the astrolabes in a variety of languages (https://tinyurl.com/yc97qmqq). This example was a “proof of concept”—simply replace astrolabes with any collection, upload to Wikidata, and easily generate a similar website telling the story of that collection. When new items are added to Wikidata, the website is automatically updated.

Fig. 1
figure 1

The Astrolabe Explorer website

5.1.2 University of Edinburgh, Scotland: The witch-hunts project

In 2014, Ewan McAndrew was hired to serve as the University of Edinburgh “Wikimedian-in-Residence”. While widespread in cultural, historical, medical and governmental institutions, this position has been a first for a university. One project that showcases contributions to students’ learning is uploading a scholarly database about the 16th−17th centuries Scottish Witch Hunts to Wikidata. The database was inaccessible to the public and no longer maintained, but held high-quality data from reliable sources curated by researchers. After data was transformed into Wikidata, old locations names were matched with current names and coordinates were added. Once completed, queries were used to display results on an interactive map. A new website was created to tell the story of the witch trials in an engaging, visual and interactive way, as shown in Figure 2 (https://witches.is.ed.ac.uk/timeline/). McAndrew explain that using Wikidata allowed to “breathe new life into it. From a forgotten and unused database… (to) new possibilities for faculty, students and the general public.” The project gained much media attention, including this Smithsonian article https://tinyurl.com/y7dvjsdh. Both faculty and students appreciated working with a real dataset, with actual impact. The project encouraged others to add information into Wikidata, and has inspired similar related projects. The university now recognizes Wikidata as a platform that enhances skills, capacities and literacies, and is exploring new ways, additional courses and more databases that could be enhanced by Wikidata.

Fig. 2
figure 2

The Scottish Witch Hunts website

5.1.3 The Metropolitan Museum of Art, USA: The Portrait of Madam X

The Metropolitan Museum of Art is a leading “encyclopedic” museum, aspiring to showcase the breadth of all human art in a universally accessible way. In 2017, under a new open access policy, the MET released 375,000 images from its collection of over 2 million works under a free license (CC-0) and a Wikimedian-in-Residence supported adding the Met’s metadata into Wikidata. In 2018, a new Wikimedian Data Strategist, assisted with an upload of 600,000 artifacts into Wikidata. The goal was to explore how semantically representing the Met’s collection in Wikidata can help the Met’s physical and virtual visitors explore and learn from a collection of such scale. It was also an investigation of how new technologies can assist in making sense of large data sets. As part of the collaboration, three noteworthy efforts, relevant to education and learning, were undertaken. The first effort, discussed here, was unraveling links, connections and relationships that were not known before. Figure 3 depicts the graphic results of a Wikidata query that demonstrates a connection between the painting “Portrait of Madam X”, which inspired the creation of a dress worn by Rita Hayworth in the film “Gilda”. Unknown previously to the curators at the Met, only once the painting’s metadata was expressed in a structured, linked way on Wikidata, this connection between painting and dress was revealed.

Fig. 3
figure 3

Visualizing the connections for the painting “Portrait of Madam X”

5.1.4 The Metropolitan Museum of Art, USA: The Met’s Dashboard

The second notable effort is the creation of the museum’s Dashboard, a.k.a. the Met Open Access Portal on Wikidata. This portal (https://w.wiki/Q9J) allows users to explore the Met’s collection both statistically and visually. The portal uses a tool called InteGraality (https://tools.wmflabs.org/integraality/) to track the Completeness of the collection. The tool automatically generates statistical reports per specific criteria based on queries. Such tools help users explore large-scale data collections, with visual representation of both included and missing items. A potential academic assignment could see students adding missing data to collections in any field (Fig. 4).

Fig. 4
figure 4

The Metropolitan Museum Dashboard on Wikidata

5.1.5 The Metropolitan Museum of Art, USA: The Depiction Game`

Launched in 2019, the “Depiction” Wikidata Game allows users to make micro-contributions to Wikidata. For this project, the Met’s image collection was ingested, in collaboration with Microsoft Research, by an Artificial Intelligence (AI) system trained with Met images (https://tinyurl.com/y7aaurpj). The AI algorithm suggests what is depicted in a picture, for instance, a horse. A human playing the “Depiction” game confirms the AI suggestion, as shown in Figures 5 and 6, and if a horse is indeed present, a statement to this effect is automatically added to Wikidata. The game allows a depiction of a variety of objects, such as musical instruments, animals, flowers, vases, etc. This, in turn, allows users to explore Met paintings that portray such objects. In the broader perspective, this allows users to get accurate answers to new types of questions, therefore allowing new types of research, not possible before Wikidata.

Fig. 5
figure 5

Playing the “Depicts” Wikidata Game at the Met. Credit: Fuzheado, CC-BY-SA 4.0

Fig. 6
figure 6

The Depiction Wikidata game

5.1.6 Tel Aviv University, Israel: An academic course featuring Wikidata

In 2018 a new course opened at Tel Aviv University (TAU): "From Web 2.0 to Web 3.0, from Wikipedia to Wikidata". This for-credit, elective course, the first of its kind worldwide, is available to all undergraduate students at TAU. It focuses on Wikipedia and Wikidata and encourages active learning, while promoting digital literacy, data literacy and academic skills, as well as raising awareness to knowledge gaps, online bias and battling fake news. One of the course’s two main assignments is a Wikidata project, which involves curating and extracting data from Wikidata, while presenting it in a visual way. This exposes students to issues such as ontologies, data modeling and basic querying skills, as well as data visualization, gaps, bias, sourcing and completeness, thus strengthening the students’ data literacy. A Wikidata project could have students exploring the gender equality among faculty members by creating a query checking how many female faculty members are included in Wikidata, finding gaps, adding missing data based on reliable sources; then re-running the query, watching data being added visually. For its second 2020 iteration, the course was revised based on students’ feedback, faculty insights, and the impact of COVID-19. The course is now virtual and more focused on Wikidata (Fig. 7).

Fig. 7
figure 7

Example from a TAU class exercise: map** relationships of 9 items connected to the course in a graph format using a Wikidata query

5.1.7 The School of Journalism, Faculdade Cásper Líbero, Brazil: The Municipal elections case

João Alexandre Peschanski, a professor at the School of Journalism, Faculdade Cásper Líbero, São Paulo, and a researcher at the Center for Neuromathematics at the University of São Paulo, worked with his students to answer the question: “How can we efficiently and effectively improve content on municipal elections in Brazil on Wikipedia? While creating election-related Wikipedia articles is important, editing these articles can be tedious, boring and therefore susceptible to human errors. However, bots can do this work easily enough, and the result was a tool that automatically generated Wikipedia articles based on structured data in Wikidata. These articles include not only tables but textual paragraphs that were automatically generated by a template. The final article included 2 empty sections, ready for humans to add details to. Peschanski stated that these articles have already been viewed over 50 million times, and proved to be quite needed and impactful (Fig. 8).

Fig. 8
figure 8

Wikipedia article auto-generated by Wikidata-based templates

A simpler version of this technique was previously used in order to auto-generate simpler Wikipedia articles about Works of Art, Museums, Libraries, Archives, Theaters, Books, Movies, Earthquakes and Newspapers (https://w.wiki/Peg). The technological tool that enables this is called MBabel (https://w.wiki/NAq). It was adapted from a project originally done at the Metropolitan Museum, and improved upon by the Brazilian community.

A more advanced version of this technology now allows Peschanski and his students to automatically generate a semantic WikiBook – an open textbook (open educational resource, or in short, OER), about a collection in one of Brazil’s museums. The WikiBook is created mainly by contributing data about pictures in the museums catalogue, or by playing a simple Wikidata game, which in turn contributes to the data curated in this open educational resource. The data added to Wikidata is then extracted using queries and added into templates that generate the WikiBook (https://w.wiki/J$R) (Fig. 9).

Fig. 9
figure 9

A Wikibook generated via Wikidata

Using Wikidata as a means to contribute to other Wikimedia projects, such as Wikipedia and WikiBooks, substantiates that Wikidata could be used to generate a much more structured Wikipedia. In this sense, every single article about a notable personality in all 300 language versions, should have the same datum of “date of birth”. As far as Peschanski is concerned, everything that could be automatically generated by a bot, should be done that way, so volunteers or students who write articles can focus on more exhilarating work and on details that are not yet structured. This is especially important for smaller language communities that do not have enough volunteers to generate Wikipedia articles and other Open Educational Resources in their own language.

5.1.8 School of Journalism, Faculdade Cásper Líbero, Brazil: Reconciling data from heterogeneous databases case

Peschanski and his students also used Wikidata to reconcile data from heterogeneous databases. Since databases external to Wikimedia can have disagreeing data, humans examined the work of bots making informed decisions. For example, a highly ranked page on Portuguese Wikidata that automatically curates all the people who were killed or missing during the military dictatorship (https://w.wiki/JvF). The page is auto-generated via Listeria bot, based on a query from Wikidata, and the table hosts references for each personality. In Figs. 10 and 11, two different references provide conflicting information. Wikidata curates and displays the different pieces of information, alleviating potential misinformation.

Fig. 10
figure 10

Wikidata-generated list of killed or missing individuals, demonstrating data based on 4 sources with 2 different death dates

Fig. 11
figure 11

Wikidata statement about conflicting dates, with their sources

The aggregates data in Wikidata enables research about the reliability of the sources, statistically examining the quality and accuracy of different databases. Bots flag disagreeing sources and students intervene and determine the right answer. The skills gained in such projects are an important part of information and data literacy, which equip students to be better consumers of information in a reality of “fake news” and “post-truth”.

5.1.9 Brazil: Digitally recreating lost Museum artifacts

In 2018 Brazil’s national museum was completely destroyed in a fire. The museum’s collection was not entirely digitized, and the files were also consumed in the fire. The loss for Brazilian culture was so immense that a group of Wikimedia volunteers started a process, referred to as “Data Archaeology”, to recreate the museum digitally. First, a crowd-sourcing technique was used, asking the public to upload pictures taken at the museum. Then, Wikidata was used to curate information on lost objects via a tool called “tabernacle” (https://tools.wmflabs.org/tabernacle/), which curates structured data in a tabular, multi-lingual, and visual way (https://tinyurl.com/4msa67pc). This digital recreation reveals another use of Wikidata that can be critical for educators and researchers of cultural heritage (Fig. 12).

Fig. 12
figure 12

Digitally reconstructed the Afro-Brazilian room

5.1.10 Germany, Australia, Brazil: Tracking the COVID-19 pandemic with Wikidata

A COVID-19 portal on English Wikipedia (https://w.wiki/QNe) showcases a table tracking the disease progress in different countries. The portal exists in many languages, and in Portuguese is automatically generated and updated according to data added into Wikidata. This process pulls information from diverse databases and sources, the data is being curated in Wikidata and then utilized by Wikipedia in all languages. A Google search also directs users to Wikipedia, which sources its data from Wikidata. This is probably one of the most accurate and reliable online sources on COVID-19, and leads the way toward an online ecosystem that produces query-based digital items (Fig. 13).

Fig. 13
figure 13

Tracking the pandemic’s progress

Another example of creating a new digital object based on data curated in Wikidata is a collection of Covid-19-related queries created by an Australian academic. The queries allow users to explore notable cases by occupation, age distributions, and birthplace maps, in a visual and engaging way (https://tinyurl.com/y7xbk7ul) (Fig. 14).

Fig. 14
figure 14

A query of notable cases of COVID-19 by occupation visualized in a bubble chart 

5.2 The uses

Our second research question aimed to extract and map different Wikidata uses in educational, research and cultural contexts by early adopters. The goal was to unravel patterns of current uses and examine Wikidata’s potential as a learning platform for education and research in all disciplines. A thematic analysis was performed on 6 interviews, covering 9 out of the 10 projects (excluding the project in 5.1.6). Coding and analysis of these projects revealed eight main uses of Wikidata that induce learning (n = 435, 41% of the total statements):

  1. 1.

    Connecting, modeling and cataloguing data from separate sources – this is the most prevalent use and core ability of Wikidata, allowing users to describe items using rich data, based on a variety of sources. In doing so, Wikidata is serving as a hub of information, aggregating information from a variety of never-before-connected sources of information.

  2. 2.

    Using Wikidata to make knowledge & culture freely accessible – this use, almost as frequent, touches on the basic ability of the platform to make knowledge freely accessible to everyone. Due to the use of a Creative Commons license, CC-0, Wikidata is one of the largest Open Educational Resources in existence, sort of a “big data” reservoir that freely available to the public.

  3. 3.

    As educational platform for teaching & learning – this use showcases Wikidata utilization in the classroom, enhancing students’ skills and generating meaningful work with social impact. In this case, Wikidata is used as a learning platform, similarly to Wikipedia, a platform through which learners gain subject-matter-relevant knowledge, as well as skills, literacies and capacities.

  4. 4.

    Creating new digital objects that did not exist before – once structured linked data exists, it can be queried and visualized, thus creating new digital objects that did not exist before. Examples include the Wikibook and the list of killed and disappeared.

  5. 5.

    Answering new questions & surfacing unknown connections – with structured, linked data, we can answer hard, or even impossible, questions and reveal new connections between pieces of information – previously unknown. Each datum is described separately, but because it is linked, queries can help reveal unknown relationships between pieces of information, much like in the case of “Portrait of Madam X”.

  6. 6.

    Salvaging data that otherwise would be lost – “data archeology” can be used to reconstruct lost physical items, or databases on the verge of extinction, presenting a sustainable, central solution to salvaging data and giving it new life, as was the case with the burnt Brazilian museum, or the witch hunts in Scotland.

  7. 7.

    Using Wikidata to improve external Databases – Wikidata games use the ‘wisdom of the crowd’ to map items and improve, or correct, institutional metadata. As many institutions do not have the resources to properly map and update the metadata related to their collections, Wikidata games that allow users’ “micro-contributions” through gameplay, is an important step in engaging the public in hel** institutions improve their metadata. Players do not need to know anything about Wikidata, but their contributions not only help improve the databases but also allow new types of research through Wikidata.

  8. 8.

    Auto-generating new content – least frequent, but important nonetheless, is Wikidata’s ability to assist constructing Wikipedia articles or WikiBooks. The system creates an article outline, with volunteers adding aspects that cannot be addressed by machines (Table 1).

Table 1 Map** uses per project

A chi-square goodness for fit test, which compared the observed sample distribution with the expected probability distribution based on the proportion of statements in each sub-category, was statistically significant, X2 (7) = 141.75, p = 0.000. The discrepancy between the observed and expected frequencies is used to determine which cells within the contingency table generate residual scores that are larger in magnitude than might be expected by chance (Hadad et al., 2021; Sharpe, 2015). The standardized residual presented in Table 2, shows the degree to which an observed chi-square cell frequency differs from the value expected in the interviews based on their data.

Table 2 Uses in numbers

As indicated in Table 2, significant differences were found between the proportion of statements in each sub-category. While the combined top three sub-categories cover over 63% of the coded statements depicting uses, the rest of the sub-categories were much less frequent. Considering the small sample size and the innovative nature of several projects, it is expected that some uses will be less frequent. Frequency, then, does not imply significance or importance of use, as will be discussed below.

5.3 Benefits and challenges

Analyzing the interviews and projects revealed benefits of using Wikidata that encourage and support learning, and challenges that should be addressed or considered. Benefits included 485 statements (46% of total). Challenges were less frequent (as expected) and included 132 statements (13%). A chi-square goodness for fit test examining the benefits and challenges, was statistically significant, X2 (1) = 200.82, p = 0.000. Three main categories emerged and repeated for both the benefits and challenges: outreach, education and platform related statements, though with alternating order of frequency.

5.3.1 Benefits

Examining the benefits of interacting with Wikidata (N = 485, 46%), the order was: outreach-related (n = 230, 47.42% of statements), Education-related (n = 165, 34.02%) and platform-related (n = 90, 18.56%). A chi-square goodness for fit test was statistically significant, X2 (3) = 65.86, p = 0.000. Additional chi-square goodness for fit tests were performed on each sub-category and sub-sub category, and were statistically significant.

Outreach-related benefits appeared most frequently, attesting to the need for demonstrating the advantages of Wikidata to institutions and stakeholders. Sub-categories included: highlighting the benefits of making collections accessible to increase the completeness, quality and reliability of data, allow a positive social impact, and offer a multitude of uses and applications; allowing the discovery of information that otherwise would be hidden or inaccessible; using an external platform as more cost-effective solution, compared to develo** one in house; and the ability to extract specific details and tell compelling stories with data.

Education-related benefits showcased the significant educational potential this platform has for educators and learners. Sub categories revealed that being able to engage with data motivated users to contribute, especially knowing their work will last and benefit others. It also revealed that engagement with Wikidata helped improve different skills and highlighted new opportunities for faster collaboration in Academia.

Platform-related benefits were the least frequent, focusing on data visualization and the ability to easily explore information. Additional benefits were the multilingual nature of the platform, overcoming language barriers and even engaging with machines; the power of a cross-disciplinary, global community to work with; the ability to flexibly model items and reconcile different sources of information; and the tools built around Wikidata, which allow users to work more efficiently and scale their efforts.

The category tree for all benefits of interacting with Wikidata is presented in Table 3. The standardized residual score shows the degree to which an observed chi-square cell frequency differs from the value expected in the interviews, based on their data for the categories, sub-categories and sub-sub-categories.

Table 3 Benefits in numbers

5.3.2 Challenges

Examining the challenges of interacting with Wikidata (N = 132, 13%), the order of the main categories was: platform-related (n = 77, 58.33% of statements), outreach-related (n = 45, 34.09%), and education-related (n = 10, 7.58%). A chi-square goodness for fit test, was statistically significant, X2 (2) = 51.05, p = 0.000. Additional chi-square goodness for fit tests were performed on each sub-category and were statistically significant for the platform-related and outreach-related sub-categories. A Chi-squared test was not performed on the education-related sub-category, as their values did not meet the assumption of the test.

The high frequency of platform-related challenges, especially considering it was the lowest scored benefits category, attest to the complexity of this relatively new platform. It seems that there is still a high threshold for engagement, requiring specific skills to use the platform to its full potential. Modeling issues, specifically trying to model challenging items or addressing biases, as well as the platform’s own limitations, emerged as additional challenges.

The majority of outreach-related challenges related to the need to persuade others of the benefit of Wikidata in order to implement its use. Additional challenges included the need to track incompleteness of datasets to clarify which portion of topic map** has been achieved; fears expressed by experts from cultural institutions or Academia of “losing control” of their contributions; and the mental burden of volunteers in less resourceful countries who felt they had to do all the work themselves or it will never be done.

Education-related challenges were the least frequent. It seems that users found more benefits than challenges for incorporating Wikidata as a learning platform. Sub-categories highlighted challenges with: students’ motivation to invest in Wikidata; the complex and time-consuming task of implementing Wikidata into academic curriculum; the slow pace of changes to course design; and the need to address a variety of, sometimes conflicting, needs of different stakeholders (students, faculty, institutions, Wikidata community).

The category tree for all challenges of interacting with Wikidata is presented in Table 4.

Table 4 Challenges in numbers

6 Discussion

For years, the captivating idea of a Semantic Web inspired various attempts to realize this dream of a `web of data`, one that both humans and machines can access and make use of. But the Semantic Web is no longer a dream. Wikidata, Wikibase (the open-source platform Wikidata is based on, similarly to Mediawiki, Wikipedia’s platform), and similar Semantic or Linked Data projects, have forever changed the interactions between humans and knowledge, creating new learning opportunities for their users, both in and outside of classroom. Considering the plea for action from the research community to explore the potential of the semantic web, specifically, the lack of imperative research about semantic networks and Linked Data platforms as learning platforms, this study aimed to investigate Wikidata’s value for education and research in its broad sense – not only in the classroom, but rather as a lifelong learning platform for diverse disciplines, contexts and narratives.

We examined noteworthy projects from around the world that showcase how early adopters are interacting with Wikidata and using it as a learning platform. Thematic analysis exposed different uses of the platform, as well as benefit and challenges that emerged from two main interactions: Data Curation, adding data into Wikidata, in order to curate, salvage, and enrich datasets; and Data Extraction, querying Wikidata to answer difficult (or impossible till now) questions, visually examining data in an engaging way, and exploring relationships and expose connections previously unknown. The analysis has also revealed two additional interactions. The first was Data Creation or Auto-generation, using Wikidata to create new digital objects based on scattered external data, as well as auto-generating content on other Wiki projects, thus freeing humans to work on less technical tasks. The other interaction was Teaching with Wikidata: using Wikidata as a teaching and learning platform, sharpening not only learners’ digital literacy, but also promoting Data Literacy, which included touching on skills like data modeling, ontologies, critical thinking, and data analysis. This aspect of Wikidata helps fight misinformation, disinformation and fake news, for example, by reconciling contradicting sources; and finally, Wikidata assists in teaching related topics such as “Semantic Web”, “Linked Open Data” and “Digital Humanities”.

Analysis revealed that the four interactions described led to eight different uses explored above, out of which various benefits and challenges of engaging with the platform were mapped, as described in the findings. Considering the uses and benefits in light of the three pedagogical frameworks presented above, constructivism, collaborative learning and self-directed learning / heutagogy, is appears that Wikidata is an ideal platform to induce learning. First, Wikidata is built through a collaborative effort of a global community. As suggested by Constructivists, knowledge and meaning are “constructed rather than given” (Parker & Chao, 2007), through a "discussion with peers… and through reflection" (Higgs & McCarthy, 2005). The focus on solving real-life problems helps users “develop capacity for effective problem-solving behaviors (Anderson, 2016)”. Adding data to Wikidata requires users to engage in constant dialogue and negotiation of how to correctly describe the world, while taking into account the multitude of global perspectives. In order to be equitable and inclusive in describing our diverse and complex world, users in the community constantly rethinks the ontology, the modeling schemes for certain items, and how to better represent complex knowledge.

As suggested by the Collaborative Learning framework, it is specifically the engagement with a technological collaborative platform, that encourages a deeper engagement with information, stimulates a higher-level thinking, and a longer information retention (Boulos et al., 2006; Galway et al., 2014; Johnson & Johnson, 1986; Parker & Chao, 2007; Schaffert et al., 2006; Wheeler et al., 2005). The process of extracting information from Wikidata is reliant not only on the technology, but also on the community, and specifically on collaborating and learning from others, as new users to the platform seldom know SPARQL, a Semantic Web programing language that allows querying Wikidata. Users of the platform often use existing query examples, as well as community experts, to learn how to write required queries and gain insights from this vast knowledge-base.

The Self-directed learning and Heutagogy frameworks also suggest that Wikidata is a platform that promotes learning. As noted in the literature, there seems to be a shift in focus from instructing and testing competences, toward equip** learners with skills and literacies that teach them how to learn (Anderson, 2016). In a world where the structure of communities and workplace is constantly changing and new knowledge is rapidly emerging, more efforts are invested in learners gaining skills, competencies and literacies that allows them to engage with information and data in unfamiliar contexts as a lifelong process (Hase & Kenyon, 2000). Anderson stresses that self-direction and the focus on develo** skills is highly connected with “utilizing the online tools and information available” (Anderson, 2016). And indeed, no matter the type of interaction, engaging with Wikidata drives users to engage with its ecosystem of technological tools, which improve various workflows. Researchers also highlight self-motivation as key prerequisite for a successful engagement. While a detailed exploration of Wikidata users’ motivation is outside our scope and will be explored in future research, an examination of the uses and benefits suggest high levels of self-motivation and engagement in self-directed learning. Specifically, it seems that the ecosystem of additional tools is important to users in improving and enhancing various workflows. To sum up, the collaborative, technology-based and tools-reliant, self-motivated effort to engage with the platform makes Wikidata an effective and propitious learning platform that allows users to gain both knowledge and skills on an ongoing basis.

Further examining the different uses and their benefits, it seems that Wikidata has some key features or characteristics that encourage and enable learning, as well as the improvement of digital and data literacies. The first notable feature is Data Visualization. The different projects suggest that some of Wikidata’s relevance and value for education stems first from the ability to get accurate answers to questions previously difficult or impossible to address. More specifically, findings suggest that it is the ability to visualize the results that seems to be one of the most important features of Wikidata for education, research and learning. While “Data visualized and easily explored” was coded in only 45 statements (9.28%), many of the benefits described rely heavily on data visualization, including “Advocacy for Open” (75, 15.46%), “discoverability” (70, 14.43%), “storytelling” (42, 8.66%), “engagements” 77 (15.88%), “motivations” (“fun to engage with”, 18, 3.71%), and “improved Data Literacy and other skills” (29, 5.98%). Combined, these benefits add up to 356 statements, or 73% of all benefits (34% of all statements). Data Visualization, then, appears to be a key element in Wikidata’s power as a teaching and learning tool – it allows us to explore not only what is there, but also what is missing, as well as learning through context. Visualizations of structured, linked data allows us to tell stories in new and engaging ways, making sense of the abundance of data, and in turn, of our world.

Another characteristic that emerged from the findings is that using Wikidata promotes higher-order & critical thinking. While only 29 statements (5.98%) were directly coded as “Improved Data Literacy and other skills”, interviews and thematic analysis revealed that other benefits either rely on, or result in, higher-order & critical thinking, such as contemplating on or dealing with “completeness, quality & reliability” (29, 5.98%), “social impact” (25, 5.15%), “diverse uses and applications for different stakeholders” (21, 4.33%), “discoverability of info” (70, 14.43%), “storytelling” (42, 8.66%), “engagements” 77 (15.88%), “motivations” (52, 10.72%), “overcoming language barriers” (22, 4.54%), “flexible modeling and reconciling sources of info” (8, 1.65%) and “human–machine collaborations and use of tools to scale” (5, 1.03%). Combined, these add up to 380 statements, or 78% of all benefits (36% of all statements). Thus, interacting with Wikidata (whether via curation, extraction, creation or teaching), drive learners to deal with higher-order thinking and questioning of a given topic.

It seems that various data-related issues, such as data modeling, data verification, systematic bias, data manipulation, data access, and data completeness, become clearer to learners, as they see data visualized, for example, a map with missing areas, or a timeline with missing information. Simply put, the abilities to answer questions and visualize data seem to encourage users to apply critical thinking regarding the results they encounter. It is worth noting that while many of the projects highlight learning benefits that result from extracting data and visualizing it, some benefits emerged from curating information in Wikidata. These benefits include addressing ontological issues, such as how to best model items, how to make sure the hierarchies of information make sense, and how similar objects can be consistently represented in the database. A deeper understanding of data modeling was also reported to assist in critically analyzing query results. Dealing with modeling of items made users acutely aware of querying limitations, realizing that the exact way one models an item will affect the way the data would, or would not, be discoverable in a query or autogenerate content.

Both of the characteristics discussed above, Data Visualization and Higher-order & critical thinking, seem to promote improved digital and data literacies. This could include, among other things, issues relating to data modeling, data analysis, data verifiability, date completion, and systematic data bias. Some examples include: 1) dealing with mass-uploads of data involving other capacities such as ‘data wrangling’, (transforming and map** data from one "raw" data form into another), the need to “clean” datasets, and prepare them for upload in a structured, linked way; and 2) modeling items on Wikidata and working on ontological issues drive users to find the right hierarchies for information, deconstruct and analyze the world—just to reconstruct it in a structured, yet flexible way. This, in turn, makes users wiser, more informed, consumers of knowledge, leading to better digital citizens. Interacting with Wikidata, therefore, plays an important role in stimulating structural and organized thinking as well as critical thinking—one of the key skills for survival in the digital age. Educators must consider that modern learners do not have to work hard to get answers; they simply ask Siri, Alexa and other AI-based digital agents. But do users ever stop to evaluate the answers they get? In a world of `post-truth`, in which dealing with `fake news` and even `deep fake` is part of being digital citizens, it is essential to equip users with skills to evaluate and analyze data. It appears that Wikidata can assist learners develop these necessary skills.

It is important to note that despite the many benefits, interacting with Wikidata and other Semantic platforms is far from perfect and can hold challenges that may hinder learning. As the findings suggest, some criticism of the platform include: a high threshold for newcomers—both in modeling, tools and knowledge of programming; problematic and inconsistent modeling; missing data; missing references; inability to rate good sources of information; data bias; poor documentation of tools; and lack of platform interactivity. That said, despite these growing pains, interacting with Wikidata, via all uses, benefits and key features presented in this research, suggests that Wikidata holds a variety of learning opportunities. The platform appears to drive users toward improving critical thinking and acquiring a higher level of Data Literacy.

7 Conclusion

An anonymous famous proverb, inspired by an Antoine de Saint Exupéry’s text says, “If you want to build a ship, don’t drum up the men and women to gather wood, divide the work, and give orders. Instead, teach them to yearn for the vast and endless sea.” It seems that Tim Burners-Lee’s vision of the Semantic Web has inspired such yearning for a world in which humans and machines can make use of the vastness of data available in new, more informed ways that advance humanity. Wikidata cannot fulfil the dream of a Semantic Web by itself, as it requires an ecosystem of structured-data-driven websites that are connected to each other. That said, it stands as an important step forward in a reality where humans and machines can have easier and more meaningful access to data. Additionally, Wikidata has its limitations, in terms of what could be structured in it. Wikidata cannot contain everything possible, and not everything could be structured or modeled in it. Nevertheless, it is still an important milestone for humanity, one that keeps inspiring further technological developments, including recent AI advancements such as ChatGPT, despite various challenges.Footnote 4

One important aspect of Wikidata that has not been fully explored in this paper is its open license, meaning that data modeled in it is considered an Open Educational Resource (OER). The term OERs was defined by UNESCO back in 2002 as “teaching, learning, or research materials that are in the public domain or released with an intellectual property license that allows free use, adaption, and distribution” (UNESCO, 2002). Thus, an OER is defined primarily (though not exclusively) by its license, with Creative Commons licenses being the most widespread. For some educators, the main incentive for using OERs is minimizing textbooks’ cost, still a financial burden in many countries (Hegarty, 2015; Lin, 2019). For some, it is the desire to create a ubiquitous, mobile learning experience by accessing materials anywhere, anytime (Hegarty, 2015; Lin, 2019). For others, the preference for OERs is part of a wider pedagogical, if not ideological, perception that values OERs not only as a means of knowledge equity, but also as means to acquire relevant skills, competencies, capacities and literacies in a world where learners are also digital citizens (Cronin & MacLaren, 2018; Evenstein Sigalov & Nachmias, 2017; Hegarty, 2015; Lin, 2019; Wiley & Hilton, 2018). Using emerging open technologies, for both knowledge acquisition and knowledge creation, entails gaining relevant skills for 21st century learners.

Using Wikidata to create OERs and improve skills also connects to UNESCO’s framework introduced in 2015, called the “Sustainable Development Goals” (SDGs).Footnote 5 The SDGs are a collection of 17 global goals that were designed to be a "blueprint to achieve a better and more sustainable future for all”, and were approved by the UN’s General Assembly in 2017Footnote 6. Out of the 17, goal number 4 is focused on “Quality Education”, with the full title being “Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all” (UNESCO, SDG 4). Goal number 4, then, highlights the importance of open and equal access to education and educational resources (UNESCO, SDG 4). More specifically, it highlighted the role of Open Education (OE), sometimes referred to as Open Education Practices (OEP), Open Pedagogy (OP), or simply OERs, in achieving this SDG (Jha et al., 2019; Lane, 2017; Ossiannilsson, 2019; Tlili et al., 2020; Urbančič et al., 2019). Fulfilling the 4th SDG, Quality Education, through OERs, ties strongly to the final characteristic or key feature highlighted while reviewing the uses and benefits—Wikidata seems to support knowledge equity, by empowering less established communities. This is done by offering opportunities to undertake projects with positive social impact, and to semi-automate creation of content in local languages, especially those with less volunteers to support them, thus overcoming knowledge gaps and language barriers.

The latter is especially important considering the emergence of Abstract Wikipedia and Wikifunctions. Abstract Wikipedia, approved in July 2020, is “a strategic effort and a new Wikimedia project”.Footnote 7 It is “an extension to Wikidata that aims to create a language-independent version of Wikipedia using its structured data” (from Wikipedia). It aims to expand the range of what could be expressed with Wikidata. This allows overcoming language barriers by structuring bigger portions of Wikipedia articles, thus enabling auto-generation of content translation into languages with smaller communities. Wikifunctions, announced in December 2020, is “ a collaboratively edited catalog of computer functions that aims to allow the creation, modification, and reuse of source code, closely related to Abstract Wikipedia” (from Wikipedia). It is meant to help express in a structured way, facts that currently are impossible to express via Wikidata, due to current structure limitations. An example of such structural limitation was given by Vrandečić, who is leading the development of Abstract Wikipedia and Wikifunctions, during an interview on the future of Wikidata and Abstract Wikipedia. He notes that in the case of Marie Curie, in most Wikipedia articles, the intro narrative would usually mention that she is the only person to receive 2 Nobel Prizes, in Physics and Chemistry. But it is not as simple to structure this fact on Wikidata. While the platform allows to structure the fact that she received a Nobel in both these disciplines, it is currently impossible to demonstrate in a structured way the uniqueness and importance of her double-win. This is something Wikifunctions will allow users to do in the future. Wikidata, then, is at the heart of these, and many other, future advancements. It is an important part of the data ecosystem and a catalyst for the Semantic Web and Linked Data initiatives, especially with Wikibase being increasingly adopted by other institutions.

To conclude, in its ten years of existence, Wikidata has shown great potential that we are merely beginning to explore. The implications, from Education, through Research, to actual applications for industries, is at burgeoning phase, and though appear propitious, additional research is required to fully explore them. It is hoped that despite its limitations, this research will be a step** stone in investigating learning with semantic networks. It is also hoped that this research will inspire educators to experiment with Semantic Web and Linked Data platforms and applications as a learning tool, implementing them into the academic curriculum. Finally, it is hoped that the findings of this research will encourage further investigation by researchers, institutions and industries, contributing to future semantic applications, leading towards a more sophisticated future, where the existing data is better utilized for the benefit of learners and the general public.

8 Research limitations

This research aims to shed light on the potential and value Wikidata has for educators and learners around the world. Only 7 users and 10 use cases or projects were discussed in this specific paper. When a small sample is concerned, there is always a chance that some descriptive elements may be used to induce from the specific to the general in an inaccurate way. Moreover, even though an emphasis was put on finding diverse example, some users or existing cases were not discussed and fell out of scope for this research. It is especially challenging to determine whether the diversity reached in the sample represents the larger population of Wikidata users, mainly since most users are unknown or might be unreachable. Therefore, while there is value in describing this phenomenon, larger-scale research is needed, that might analyze the topic from quantitative lenses. Finally, this field of research is rapidly evolving, constantly changes and is influenced by other technological advancements. It is not unlikely that at some point there will be technological breakthroughs that may change the relevance of accuracy of some of this research findings.