Introduction

In the last decade, there has been a significant raise in the amount of research articles, projects, and initiatives around the use of Big Data and artificial intelligence in education (Kastrati et al., 2021). Some of these studies have been developed in higher education, including online learning and hybrid or blended learning contexts. In these educational settings Virtual Learning Environments (VLE) such as Moodle are very common. Since the COVID19 pandemic there is an extensive number of students’ data stored and exchanged through these educational platforms.

With the aim of understanding and optimizing learning, different techniques for data analysis, measurement, collection, and reporting about learners’ variables (e.g., time spent using the VLE, number of messages exchanged or type of interactions) are developed and grouped under the umbrella of Learning Analytics (LA) (Buckingham Shum & Ferguson, 2012). These techniques have been extensively studied in higher education in relation to performance and dropout predictions (Iglesias i Estradé, 2019), and assessment (Kastrati et al., 2021).

Sentiment Analysis (SA) is one of these LA techniques, defined by Mite-Baidal et al. (2018) as the contextual mining of unstructured text from documents, so that structured and insightful knowledge can be obtained and used for different goals. From the analysis of how students express themselves, especially in asynchronous environments, such as texts, forums, wikis or debates, teachers and students can have more information about students’ satisfaction and reactions within a learning activity. Hence, teachers can appropriately and timely adapt their teaching, improving the quality of learning processes, while students can develop self-reflection. However, most of the SA techniques demand a high level of programming skills to be implemented (Kastrati et al., 2021). Additionally, they provide complex results in return, which hampers its use for almost all teachers and students. Therefore, it becomes necessary to explore which techniques have been developed, in which educational contexts, and also which of them integrates frontend modules or visual software to facilitate the interpretation of SA results among higher education teachers and students.

On the other hand, research on human communication has identified the existence of gendered patterns, which can be different for specific areas. For instance, Çoban et al. (2021) showed that male users on social media posted more positive messages as they were older, while the opposite was observed for females. Unfortunately, these patterns seem not having been integrated in the big data techniques. Thelwall (2018), argues that these techniques (and SA as well) can be gender biased. To promote a fair and inclusive education, it becomes necessary to explore how SA techniques can be more gender sensitive. It becomes necessary to study systematically previous contributions to have a better perspective on how SA can be applied inclusively in higher education.

This systematic literature review explores how SA has been implemented as a tool for assessment in online higher education research, to identify useful tools and techniques and if these have been applied within a gender perspective. To this end, this paper presents a brief literature review of the main conceptual points, the aims and research questions of the systematic literature review, and the methodology, based on the PRISMA statements (Moher et al., 2009). Results and discussion section shares the outcomes of the systematic literature review and raise different challenges identified from the analysis. Conclusions highlight the major contributions of our work, considering its limitations and future implications.

Literature review

SA for learning assessment in online higher education

Assessment of learning involves a process of verification, evaluation, and decision-making with the purpose of optimizing the teaching–learning process (**er et al., 2018). Formative and final assessment can increase motivation and involvement of students and provide opportunities for the correction of errors (Gikandi et al., 2011). Formative assessment represents a learning experience itself, develo** students’ autonomy, communication, and self-reflection and, in consequence, improving academic achievement (Martínez Cámara et al., 2016). An adequate formative assessment becomes essential for higher education online learning environments.

Different formats for providing formative assessment in online learning environments can be used, such a reflection papers/diaries, quizzes, wikis, discussion forums, blogs and e-portfolios (McLaughlin & Yan, 2017; Vonderwell & Boboc, 2013). Among these, discussion forums are the most common formative assessment tools (**ong & Suen, 2018).

Traditionally, in online higher education, teachers carry out formative assessment from a one-to-one perspective. However, there is a consensus in the pedagogy literature that individuals learn from student–teacher interactions, but also from other social, student–student interactions (Borokhovski et al., 2012). From this perspective, different studies have investigated how online formative assessment can also integrate all these interactions and relate them to successful learning outcomes (Onan, 2021).

In the learning process, it is of particular interest how the affective domain may influence and be influenced by the interaction with peers. According to Buckingham Shum and Ferguson (2012), online learners may use the affective domain of learning to clarify their intentions, ground their learning, and engage in learning conversations. Kashy-Rosenbaum et al. (2018) measured significantly higher academic achievement within classrooms characterized by positive emotional environment, and significantly lower within classrooms characterized by negative emotional environment. In VLE, participants can share additional insights regarding course topics and provide their impressions and affective states through online forums and debates (Moreno-Marcos et al., 2019), easing the data storage of such information.

However, the enormous amount of information exchanged through VLE can exceed an affordable workload for teachers (McCarthy, 2017). This difficulty in digesting a big amount of textual data in a relatively short length of time may seriously jeopardize the understanding of social online learning and, in consequence, the quality of periodic formative assessment provided (McLaughlin & Yan, 2017).

For these reasons, in educational settings, different SA techniques have been developed as a proxy measure in real time students’ sentiments (Yadegaridehkordi et al., 2019). SA techniques have been mostly used to improve the understanding of educational processes, study participants’ satisfaction, and make performance and dropout predictions (Iglesias i Estradé, 2019). These techniques are designed and trained with a collection of textual data, and its application usually involves a high level of programming skills. As well, some researchers are starting to develop complementary front-end solutions to ease the technique management and interpreting results, such as edX-CAS or RAMS (Cobos et al., 2019; Elia et al., 2019), but there is not a global picture about what different solutions can be found in the literature and which of them are best suited for HE. Therefore, to leverage the benefits of formative assessment using SA, it becomes necessary to know what automated tools exist to help teachers to efficiently process and understand data from students’ sentiments, so they can enhance the timely assessment of learning.

Gender Bias in SA, higher education and assessment

Unfortunately, SA techniques have paid little attention to potential differences in communication related of personal characteristics of participants, such as cultural differences, language barriers, age, and gender (Yadegaridehkordi et al., 2019). However, researchers have identified the existence of gendered patterns of communication between participants’ uses of virtual environments. In VLE, females usually interact more than their male peers (Oreski & Kadoic, 2018; Van Horne et al., 2018), they usually participate to a lesser extent in the proposed activities with contributions that integrate fewer mistakes (Kickmeier-Rust et al., 2014). Moreover, Shapiro et al. (2017) observed that females expressed more negative views about their progress and self-perceived evaluation. These differences in communication patterns have not been considered in SA techniques development. This can jeopardize the effectiveness of SA as a generalized assessment tool, but also introduce bias and inequality in formative assessment.

Previous systematic literature analysis about SA

To establish the present context for this review, considering the prior research is mandatory. Relevant literature on the field of SA in education was examined, and three previous systematic literature reviews about SA were found: Kastrati et al. (2021), Mite-Baidal et al. (2018), and Zhou and Ye (2020). Mite-Baidal et al. (2018) studied the application of SA in e-learning in higher education. These authors focused their analysis on the resources and techniques used in relation to different SA analytical approaches, and the major benefits of SA. These benefits were “learning process improvement, performance improvement, reduction in course abandonment, teaching process improvement, and satisfaction with a course” (Mite-Baidal et al., 2018, p. 292), evidencing that the literature analysis had been carried out from the perspective of the improvement of the design of e-learning systems and teaching efficiency. The second review, carried out by Zhou and Ye (2020), followed a similar approach, but unlike the previous literature review, these researchers identified SA could also be used to help students perceive their emotions through the use of visual presentation and feedback. However, apart from the improved ability to perceive and adjust one’s emotions, the authors did not discuss how SA could be a tool to ameliorate students’ formative assessment strategy through the improvement of self-reflective skills. Finally, Kastrati et al. (2021), carried out a more technical approach review, focused on the analysis of the datasets, solutions and the emotional expression and detection of the SA techniques used; they did not consider how SA techniques are used within the teaching—learning process whatsoever.

Previous systematic literature reviews have contributed to understand which SA techniques and its features have been used for educational purposes as well as, which applications for teaching and course design improvements can be identified in recent years. They evidence that the solutions are usually too technical, and that there is a lack of research effort on the use of general-purpose visualizing solutions (Kastrati et al., 2021). These reviews also identify a lack of research about how SA can be a tool for leveraging the quality of the formative assessment provided and how students’ self-reflective skills and metacognitive abilities can be developed; as Treceñe (2019) highlights, and as well how SA can be use with a more inclusive approach. Therefore, this article focuses on the analysis of the use of SA as a tool to improve students’ assessment and their learning processes from a gender-perspective approach, as a step in the development of fair SA techniques based on previous research evidence.

Aim and research questions

The body of literature is lacking a review that systematically classifies the research and results of the application of SA in higher education domain, related to students’ feedback and assessment, considering possible gender differences in students’ communication, and offering practical solutions for teachers. Hence, this literature review aims at exploring how Sentiment Analysis (SA) has been applied in higher education to identify best strategies in fostering gender-inclusive online learning environments. In particular, the following questions and sub-questions arise:

  • (RQ1) Which are the different SA techniques used in formative assessment in higher education?

  • (RQ2) Which are the different frontend software based on SA developed to provide teachers and students with timely and easy-to-understand SA analysis?

  • (RQ3) Which are the general contributions of SA as a tool for students’ assessment in higher education?

    1. o

      (RQ3.1) From these general contributions, how has SA been used in higher education, from a gender perspective?

Methodology

This literature review was conducted in the following three review phases based on Kitchenham (2004): planning, conducting and reporting. Since this method requires an established search protocol and rigorous criteria for the screening and selection of the relevant publications, we utilized the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, as indicated in Liberati et al. (2009). These guidelines were used for the description of the eligibility criteria, information sources, data collection process, data items, and synthesis of results, as summarized in Fig. 1.

Fig. 1
figure 1

Graphic summary of the literature review procedure, based on the PRISMA statement Liberati et al. (2009)

Study search strategy

The following search string was used to search in ERIC, Scopus, and Web of Science databases:

(“distance education” OR “distance learning” OR “virtual learn*” OR “virtual classroom” OR “online learning” OR “Web-based learning” OR “electronic learning” OR blended OR hybrid OR “computer-assisted instruction”) AND (“sentiment*” OR “sentiment analysis” OR “learning analytic*”) AND (“assessment” OR “feedback*” OR “evaluation*” OR “formative assessment”).

Studies considered for the identification phase could come from publicly available scholarly articles, book chapters, technical reports, dissertations, or presentations at scholarly meetings from Educational Sciences disciplines. We restricted the search to study title, abstract and keywords.

As results from the first strategy search were mostly from computer-science disciplines, manual searches were performed in 21 educational research journals, those were identified from the first strategy search, and also from the references in the three SA reviews mentioned in the introduction. Journals’ websites were visited, and the same search string was used to search for articles related to higher education and teacher training as a complementary search.

Inclusion and exclusion criteria

Table 1 shows inclusion and exclusion criteria used in the screening, eligibility, and inclusion phases. All studies that met one or more exclusion criteria were excluded. Timeframe for publication’s date was from January 2006 to June 2021. The 2006 year is considered to be the beginning of the “learning analytics age”, according to Picciano (2012). The end date was set on June, as this study was conducted between May and August 2021.

Table 1 Inclusion and exclusion criteria

Conducting the review

Before starting the review, the authors discussed and agreed on the research questions related to the information to be extracted from the publications. From this, the authors defined the search strategy and the inclusion and exclusion criteria.

Identification phase

In this phase, the documents from all the databases were retrieved using the search string. 402 publications were retrieved from the three databases and 134 publications were added from additional outstanding journals, identifying a total of 536 potential publications. We extracted information about the publication itself (i.e., title, authors and year of publication), learning outcomes, SA techniques and tools or software, and gender analysis, and incorporated it into a database for further analysis. This process identified and removed 14 duplicates and select 518 publications for the screening phase.

Screening phase

In the screening phase, documents were excluded by title applying the exclusion criteria (Table 1). We filtered 247 publications that were not meeting at least one exclusion criterion. A total of 271 publications were selected for the eligibility phase.

Eligibility Phase

Abstracts of the 271 publications were read to assess whether they satisfied the inclusion and exclusion criteria (Table 1). We excluded 213 publications, mostly because they did not focus on education (111), did not focus on higher education (70), or although the research context was educational, they focused only on technical aspects of SA (32). 58 publications were selected for the eligibility phase.

Inclusion phase

The 58 remaining publications were examined by reading them and applying the inclusion and exclusion criteria (Table 1). Finally, 31 publications were excluded. 4 of them were not accessible, and 27 publications did not meet the following inclusion criteria: SC1 (7), SC2 (6), SC 3 (12), and SC4 (2). We applied SC5 criteria to include references but still, we kept those ones not meeting it, to have a broader sample of final publications considered. 22 publications were finally selected. These publications were studied, based on the information in the database, by undertaking a qualitative synthesis to answer the research questions. Results are presented in the next section, following each of the three research questions.

Results and discussion

Dataset characteristics

A sample of 22 studies met all the IC for RQ 1,2, and 3. A first analysis of the search results evidence that although SA for evaluation in higher education is receiving increased attention by educational and technological research fields year by year (as can be seen in Fig. 2), it is still an emerging field. Hence, 11 of 22 studies were conference publications, which could be interpreted as a first step in the development before publishing in journals.

Fig. 2
figure 2

Evolution of the number of references in the last 9 years. Note. Results of 2021 were included until June

Regarding the origin of the publications, most of the references came from the American continent (eight references): USA (5), México (2) and Brazil (1). Six references were European: Greece (2), Spain (1), Portugal (1), Italy (1), and Finland (1). Five references were African: Nigeria (1), South Africa (1), Saudi Arabia (1), Morocco (1), and Egypt (1). Only two references were Asian: China (1) and Taiwan (1), and only one reference was from Australia.

Most of the studies focused on computer sciences (9 references; 41%), while other studies focused on science (3), engineering (1), and teacher training (2). Seven references did not specify the area of study, but mentioned undergraduate (2), postgraduate students (1) and learning communities (4). Moreover, in 11 studies (50%) authors used SA as an assessment tool in hybrid learning contexts and 11 (50%) in online learning. These results show that SA researchers usually came from STE(M) areas and from higher educational contexts where online learning plays an important role.

Regarding methodology, 14 studies were quantitative (from big-data analysis to longitudinal studies), while 4 were qualitative, and 4 used a mixed-methods approach. Because of this diversity in the methodological approach, samples sizes varied from 28 to 2600 direct participants, and when using statements or sentences as a sample, this ranged from 120 to 171,430 statements.

A first analysis evidences how research papers have evolved; while the first studies mainly focused on the methodological part (computer sciences perspective), up today studies consider educative application for higher education. The order of research questions tries to facilitate the presentation of major contributions and, for this reason, emergent themes that resulted from the analysis of the 22 publications considered, are described below.

RQ1. SA techniques used in formative assessment in higher education

Major contributions to answer RQ1 are structured around three topics; automatic data collection, information extraction and sentiment measurement, as follows:

Automatic data collection

There is an evolution towards the automatization of the previous stage of SA, referring to how data is collected and preprocessed before conducting SA. Major changes are identified specifically in the data collection gathered manually in the first studies, as in Abdulsalami et al. (2017) and Zhang et al. (2012). Different authors evidence the lack of a suitable integration between educational platforms and SA processing modules, becoming information retrieving a tedious task for teachers. To foster the adoption of SA in higher education, recent studies have worked on more complex systems integrating agents to collect posted students messages from diverse discussion forums or considered sources, as in Alencar and Netto (2020). Within this trend, future studies might develop methods to integrate students’ productions on different supports beside text (Elia et al., 2019). The ethical implications that could emerge from a massive and systematic collection of students’ productions need to be considered, especially regarding the transparency of institutions, the re-purposed data analysis, and the meaningful alternatives that participants have when not allowing their data to be shared (Andreotta et al., 2021).

Information extraction

Forum messages and emails become the main data sources on learning assessment in online Higher Education (Alencar & Netto, 2020; Camacho & Goel, 2018; Chaabi et al., 2019; Le et al., 2018; Osorio Angel et al., 2020). In blended scenarios, other data sources as video transcriptions and readings have been used (Cobos et al., 2019). Two major method approaches to SA are used to detect potential sentiments and opinions in texts: the lexicon-based approach, also called Natural Language Processing (NLP), and the Machine Learning (ML) approach. The most frequent approach is NLP (ten studies), consisting of the extraction of meaning from human language using previous libraries (lexicons) as a reference. NLP does not need to take context into account, and it uses an amount of open and free linguistic libraries available online (Zhang et al., 2012). It also enables a broader scope for making inferences, finding patterns in textual data, and inferring an emotion (Mostafa, 2020) with a high precision but a low recall. Hence, although the setting of the SA technique can be more accessible to develop, the lack of discipline-specific libraries is an important limitation, as some specialized terms can be wrongly associated with emotional states (Nunez, 2020). For this reason, some authors argue that using only a lexicon-based approach is not the best method of sentiment analysis, since important information may be missed (Alblawi & Alhamed, 2017).

The ML approach, used in five of the 22 references, is a subset of Artificial Intelligence (AI), in which the algorithm progressively “learns” to identify positive and negative sentiments, allowing a more accurate pattern prediction than NLP. However, ML needs supervised analysis and a higher amount of data to train the AI and make better predictions (Alencar & Netto, 2020), which is the main reason for its less frequent use. However, from our perspective, ML approaches might be more respectful of cultural diversity in Higher Education institutions, as not only would be more feasible to be implemented in institutions using a regional or minority language as vehicular, but also, respect the cultural differences in how students from same language speaking countries express themselves (e.g., Latin-American students vs. Spanish students). The remaining seven studies use both techniques (NLP and ML) to build a more solid process of SA, as both methods are non-exclusive and can be complementary used, as in Alblawi and Alhamed (2017).

Among these studies, an increasing trend of using third-parties SA software emerges. MeaningCloud (Bilro et al., 2022) or RStudio packages (Okoye et al., 2020) facilitate the utilization of SA in broader educational contexts and better explore its potential contributions in this field. However, we caution against the massive adoption of these solutions, as they might not be successfully enough in achieving current challenges of SA (e.g., use of specialized vocabulary, interpretation of negations and idioms, conducting SA at topic level within a text… in highly formal contexts), causing significant methodological flaws in research results.

Measuring sentiment

A final consideration is needed regarding the epistemological grounding of SA techniques. The most frequent approach in SA is to classify sentiments based on the measurement of the polarity or valence association (fourteen studies). Three studies measure sentiment polarity in a binary form (positive or negative), as in Abdulsalami et al. (2017) and Elia et al. (2019), eight studies classify sentiments at a three-level scale as in Camacho and Goel (2018) and Cobos et al. (2019), and three studies use polarity scales with over three levels, as in Alblawi and Alhamed (2017) and Gkontzis et al. (2020). There is an apparent consensus in the output in the SA analysis as a three-level polarity scale. However, although the use of positive, negative, or neutral scale has been reported to be the most frequent association within three-level polarity scales (Chiarello et al., 2020), a high dispersion in the measurements emerges. For example, by measuring extremely positive, positive, and non-positive sentiments (Zhang et al., 2012) or by using normalized compound between − 1 (extreme negative) and 1 (extreme positive) in Camacho and Goel (2018), Dehbozorgi et al. (2020), and Okoye et al. (2020).

Several studies relate sentiment polarity with students’ motivation: learners with a positive attitude are more confident and motivated to learn (Mostafa, 2020; Weston et al., 2020 in Okoye et al., 2020). However, we argue for a more grounded interpretation of these SA results. The vast majority of publications perform SA at word or sentence level (i.e., attributing a sentiment score to each word or sentence and then computing the overall result), but motivation is a particular mental state, directed towards a particular topic, and usually related to behavioral responses. For this reason, we argue that measuring sentiment polarity provides information about the emotional climate of a group or general emotional state of a student (Usart et al., 2022), rather than providing information about a particular mental or emotional state. Hence, attributing polarity results to a mental state may carry the risk of falling into spurious relationships. An improvement in this direction involves performing SA at a topic level (i.e., identifying the different topics of a text/source and attributing a sentiment score to each one), but this technique still remains a challenge.

Although polarity is the most widely accepted approach in SA, it might be too limited for teaching and learning. Complementary, from the 22 studies considered, four studies identify sentiments or emotions from existing classifications, mostly as a complement to polarity. Sentiment identification is based on different psychological models. For example, Alencar and Netto (2020) and Nunez (2020) identify emotion based on Plutchik's (1984) model (anger, fear, sadness, disgust, surprise, anticipation, trust, and joy), Featherstone and Botha (2015) uses Ekman's (1992) 6 affect categories (Joy, surprise, fear, sad, anger, disgust), while Osorio Angel et al. (2020) characterizes a set of emotions arising in learning with six axes, based on Kort et al. (2001). Integrating those models into SA can contribute to evidence the relationship between a significant number of emotional states and learning outcomes. As a matter of fact, SA would improve in accuracy if emotions were considered.

Finally, in some studies, as in Yu et al. (2018), human raters are involved. While this can be a good practice to ensure accuracy and validity of the SA method, to this purpose we also suggest considering involving some of the own subjects in future controlled experiments. In formal settings as in higher education platforms, the scarcity of students’ messages or the communication methods can involve significant limitations.

RQ2. Frontend software based on SA developed

Only six out of the 22 studies have designed a specific software for visually representing SA results. From these, two different types of visual tools emerge. The first is designed for researchers use (two studies), as in Alencar and Netto (2020) and Nunez (2020). These representations are based on group information, involve the use of complex bar, pie, and sentiments’ graphs, and are not timely (perform the analysis when the activity has ended).

The second type of visual tools (four studies) is designed and implemented for final users (teachers or students) and provides timely information with complementary analysis. For example, through the SA architecture described in Cobos et al. (2019) (Fig. 3) these authors developed a tool to provide visual information of SA at a learners’ individual level (Fig. 4). Their tool allows to download the results. Elia et al. (2019) designed RAMS (Rapid monitoring of learners’ satisfaction), a visual analytics software (pie charts with polarity trends) for LA and SA, with a dashboard for course and individual students’ levels, that also performs cluster analysis. Wang and Zhang (2020) present through a Network and text representations, students’ sentiment tendencies. These tools perform timely analysis and, although they offer complementary functionalities such as cluster analysis or downloadable outputs, the amount of information presented is still big and can be overwhelming for teachers. We believe that future frontend solutions might need to display information clearly and in an understandable way, allow teachers to process data in real-time, and show trends in data over time. Based on these features, the proposal made by Yu et al. (2018), presenting a dynamic diagnostic and self‐regulated system in a speedometer visual dashboard combining SA and LA at a student level, can constitute a good example of frontend applications for educational purposes (see Fig. 5).

Fig. 3
figure 3

Architecture of EdX-CAS for SA, adapted from Cobos et al. (2019)

Fig. 4
figure 4

Example of the results of the visual tool EdX-CAS showing the polarity of a learner’s opinions in different courses, adapted from Cobos et al. (2019). Note. Teachers can click on any course id (at the left side of each chart) to display the particular polarity results

Fig. 5
figure 5

Dynamic diagnostic and self‐regulated (DDS) system (dashboard on top left; diagnostic and suggestion report on top right; weekly status on the middle; emotional valence value on the bottom) translated from Yu et al. (2018)

Hence, although only four studies present frontend solution, many other studies analyzed in this research acknowledge that the next steps in SA should go towards the development of more easy-to-use visual analytics tools for teachers and students, both at individual and at a social and collaborative levels, that would help instructors providing effective feedback interventions (Dehbozorgi et al., 2020; Le et al., 2018). This development would benefit from integrating usability tests for teachers to maximize SA tools for personalizing the information displayed by the user, such as dynamic filters (by students’ group, by each student, by academic year…), dynamic grou** (so teachers can group their own charts and add notes) and dynamic data labels (so teachers can more easily read the data).

RQ3. General contributions of SA as a tool for students’ assessment in higher education

Three main contributions of SA are discussed, based on the findings:

Assessing the emotional climate of students about an educational intervention

SA techniques are mostly used to assess the emotional climate of students in higher education, as in Spatiotis et al. (2018), or related to one particular topic, such as the use of Mobile phones in learning, as in Abdulsalami et al. (2017). Following this approach, in some studies the emotional climate assessment is carried out to evaluate the impact of an educational intervention, such as the use of gamification strategies in higher education in Bilro et al. (2022), Featherstone and Botha (2015) and Mostafa (2020), the implementation of practical and hands-on activities in Suwal and Singh (2018), or the comparison between online or hybrid learning in Camacho and Goel (2018). These studies evidence that, although SA needs to overcome important limitations mentioned in previous sub-sections, SA can help higher education teachers and researchers to have quick results of the implementation of educational innovations complementary to traditional students’ gradings, releasing research teams from the burden of analyzing a big amount of data in a first screening phase. Moreover, as SA measurements can be easily performed periodically, the evolution of the emotional climate or state can be studied to identify behavioral patterns, as in Chaabi et al. (2019) and Osorio Angel et al. (2020).

Results also corroborate that the measurement of the emotional climate through SA techniques can be a useful tool to identify gender biases. Hence, Abdulsalami et al. (2017) and Nunez (2020) described that male students expressed more positive comments than their female peers in VLE, while Okoye et al. (2020) found that students evaluated their teachers differently based on teachers’ gender. These results evidence how SA can contribute to the promotion of more equitable assessment practices in higher education institutions (not only related to gender, but to any personal factor of discrimination such as race). However, this application is still underdeveloped as only three articles included gender as a variable in their study.

Improving the prediction of students’ learning performance

Complementary, in four studies SA has been used to predict learning performance and possible withdrawals or dropouts early detection. Dehbozorgi et al. (2020) established a relationship between students’ performance and a positive emotional climate. However, no correlation between students’ negative sentiments and individual performance was measured, suggesting a need to develop more sophisticated predictive models. To this end, Alblawi and Alhamed (2017), Gkontzis et al. (2020), and Yu et al. (2018) developed predictive models combining SA of unstructured data such as students’ comments or productions with structured data such as attendance, homework completion, previous grades… etc. as the one displayed in Fig. 6 made by Gkontzis et al. (2020).

Fig. 6
figure 6

Process of SA for predicting learning outcomes based on the work of Gkontzis et al. (2020)

These three studies show that the absence of unstructured or structured data weakens the predictive ability of the models. Particularly, Yu et al. (2018) discuss that the use of SA significantly improves this predictive ability for earlier stages over the duration of a course of instruction, as in these stages, structured data might be relatively too limited to provide enough information for prediction. Moreover, Gkontzis et al. (2020) advocate for the inclusion of teachers’ data (e.g. number of forum interactions, SA…) to increase the effectiveness of predictive models. Therefore, SA predictive models need to be built from complex algorithms processing different types of data, which imply a challenging development task that many higher education institutions might not be able to assume. Finally, ethical considerations about the use of students’ data would also need to be considered, following some of the reflections shared above.

Enhancing teaching methods and feedback

Finally, SA was specifically used in two studies to improve teaching methods and feedback. Hence, Le et al. (2018) developed a module to notify teachers when a student was experiencing difficulties, as they argued that negative, distressed, and questioning sentiments can determine when a student is experiencing those difficulties. Complementary, SA of teachers’ productions (e.g., videos or pdf documents) developed by Cobos et al. (2019) could help them make informed decisions whether or not it would be useful to modify their educational materials. Drawing from previous consensus in the literature, both feedback and assessment are equally important in the development of students’ competencies (**er et al., 2018) and, therefore, it might be advisable to implement both approaches while using SA. In this sense, future studies might consider assessing the effect of SA techniques on teachers’ methods to better understand which type of SA information and how is presented can better assist teachers.

Conclusions

The aim of this SLR was to understand the state of the art on SA online and hybrid learning environments in higher education related to assessment. Findings from this review show that there is a growing field of research on SA. Most of the papers are written from a technical perspective and published in journals related to digital technologies. This research focuses on teachers and institutional evaluation, especially targeting online learners’ assessment and collaborative environments.

Reviewed studies mainly assess the polarity of sentiments and emotions, providing information about students’ emotional climate or state. Two major approaches to SA emerge, regarding to techniques: NLP and ML. While the ML methods report higher classification accuracy, NLP is the most used due to the amount of open and free linguistic online libraries available, in despite of the significant limitations in highly specialized communication channels such as VLE in higher education. Results show that a hybrid approach is more appropriate because it grants an accurate analysis from each language and in particular on learning contexts, overcoming method limitations in isolation. However, research conducted in other languages and with more data will be useful to make a stronger corpus and more precise predictions, as well as deepen into the measurement of sentiments following different models, such as Kort's et al. (2001). Moreover, automatic data collection and preprocessing might be one of the targets to ensure appropriate integration into VLE.

Further and most relevant steps on this field are aligned towards the development of visual tools for teachers and learners of SA results. At this moment, there are few studies which have already developed and implemented frontend software in higher education contexts, showing processed results in a more visual and understandable manner, but there is still room for improvement. These tools evidence the development of new feedback systems that could possibly have better application in tracing and tracking students learning needs over time.

Summing up, major contributions of SA in higher education assessment are directed towards assessing the emotional climate or status in relation to an educational intervention, so teachers can have timely information to improve students’ performance (Dehbozorgi et al., 2020; Md Faridee & Janeja, 2019), predict students’ grades (Alblawi & Alhamed, 2017), and dispose of early signs to identify at-risk students (Zhang et al., 2012). We argue that including a gender perspective in develo** SA techniques should contribute to the promotion of more equitable assessment. Although studies evidence different gender patterns in how students express themselves, this is an aspect that remains understudied.

Limitations on this study relate to SLR methodology such as time length and indexed databases search. Applying this methodology may have omitted other sources such as technical reports, as long as SA is a recent topic used in learning contexts. This study makes a significant contribution to research, providing practical information to higher education teachers and administrators related to online and hybrid learning students’ assessment. More specifically, it calls for the urgent development of an easy-to-use, timely SA frontend tools that help teachers improving students’ feedback and awareness of their sentiments in the VLE in highly specialized and more equitable learning contexts, which might result into higher learning performance.