1 Introduction

Since the turn of the century, one of the most fruitful research subjects in Natural Language Processing (NLP) has been Sentiment Analysis (SA) because of the availability of millions of people's thoughts on the web. Research on SA has spanned a wide range of fields such as economy, polity, and medicine, among others. Automatically analysing online user evaluations provides valuable insight into the efficacy and adverse effects of medications, which can be utilised to enhance pharmacovigilance systems. Due to the exponential growth of the internet and the e-commerce industry, the reviews of the product have become a vital deciding factor for purchases of the product globally. People all over the world have become habituated to reading reviews and blogs before deciding to purchase an item. Apart from the reviews, the internet also provides recommender systems that suggest a product to the user based on their interest and requirements. These systems use the reviews of the product by the customers to analyze their sentiments and recommend them.

The primary goal of NLP’s SA task is to determine how people feel about specific things (such as brands, people, organisations, and more) [1]. SA at the aspect level is a more granular activity to determine the polarity of emotion associated with a certain component of a text [2]. In the line “This drug works better for water retention, but its side effect is high,” the polarity of the positive and negative emotions associated with the two parts of the sentence, “water retention” and “its side effect,” are clear.

The reviews of medicines are less thoroughly researched than those of other products. Since multi-class categorization using text mining can be unreliable, drug reviews are typically used to label drugs as either favourable or bad. The proposed study will make it easier to divide medications into five distinct classes, each of which is defined by its relative efficacy. Knowing whether or not a drug is effective and whether or not it has serious side effects is information that can help both consumers and producers make informed decisions [3].

With the advent of NNs in the field of NLP, techniques based on NNs (such as “LSTM [4] and GRU” [5]) have been employed in several disciplines of ‘aspect-level sentiment analysis categorization to produce promising results. Sentiment categorization at the aspect level relies heavily on the interplay between sentiment words, targets, degree words, and negative words. In the right settings, the additional value provided by bidirectional neural networks [6] (BiLSTM, BiGRU) can yield significantly improved outcomes. Additionally, neural network models have their own set of issues.

The research problem in Drug Review Classification in Sentiment Analysis is to design and implement effective computational methods that can accurately classify the sentiment expressed in user-generated drug reviews. This involves addressing challenges such as the identification of subtle nuances in language, handling unstructured text data, managing domain-specific terminology, and accounting for variations in sentiment across different drugs and individual experiences. The objective is to develop robust algorithms or models that can provide valuable insights into the effectiveness and tolerability of medications, aiding healthcare professionals and patients in making informed decisions.

To glean useful information from user-generated reviews of drugs, aspect-based classification using hybrid models is essential. To further understand its value and potential uses, consider the following instances or scenarios:

Side Effect Identification: A patient discusses their experience with a medicine in a review, including both the positive effects and the negative ones. The hybrid model utilizes aspect-based classification to precisely identify and classify the drug’s positive (efficacy) and bad (side effects) aspects, giving healthcare practitioners a complete picture of the drug’s pros and cons.

Effectiveness for Specific Conditions: Patients frequently talk about how well their drugs are working for their particular health issues. As an example, a user may share their experience with an antidepressant, praising its ability to alleviate depression but expressing disappointment with how well it controls anxiety. To better assist doctors in customizing treatment strategies, hybrid models can analyze these reviews and classify feelings of efficacy for various diseases.

Comparative Analysis: It is common for patients to compare various drugs they have used for the same ailment. The efficacy, safety, and user experience of two pain medications are some of the factors that might be considered in a review. Healthcare practitioners and academics can benefit from hybrid models’ ability to assess comparison reviews by extracting feelings about different features of each drug. This allows for easier comparative effectiveness research.

Safety and Tolerability: Healthcare providers and patients alike must have a firm grasp on medicine safety and tolerability profiles. Making more informed decisions about a medication’s use in clinical practice and providing a more nuanced assessment of its safety profile, aspect-based classification can reveal feelings linked to adverse responses, drug interactions, and tolerance difficulties.

The following is a brief overview of the major results of this study:

  1. 1)

    We have worked on the 2 datasets from druglib.com and drugs.com.

  2. 2)

    Data preprocessing steps like “tokenization”, “lemmatization”, “stemming and stop word removal” to be performed

  3. 3)

    Feature extraction- TF-IDF and Word2Vec

  4. 4)

    Feature selection- Utilizing ACO (Ant Colony Optimization) is a viable approach for feature selection within the framework of sentiment analysis for drug reviews.

  5. 5)

    A novel model, RoBERTa- Bi-LSTM, is proposed. To begin, it hierarchically transfers the learned weights from the brief text-level medication review corpus to the aspect-level job. Then, it employs the attention mechanism to learn the hidden semantics of the phrase and target, using two BiLSTM networks to play the crucial role of target and generating powerful target-specific representations for sentences.

  6. 6)

    The experimental findings validate the dataset's ability to conduct a thorough investigation into the SA of drug reviews from a variety of perspectives. Besides, the RoBERTa-Bi-LSTM model may increase. The performance comparison with various baselines demonstrates that our model adeptly leverages the semantic representation of the target, showcasing its capability to fully exploit the target's semantics.

The concept of distant supervision served as the basis for the development of a novel approach that is proposed in this study to extract and classify aspects pertaining to the drug domain. The task involves identifying and categorizing aspects within sentences from the test set. To enhance performance, we utilize an annotated dataset for fine-tuning RoBERTa specifically for the aspect classification task. Evaluation outcomes demonstrate the effectiveness of the suggested model, showcasing strong performance in achieving accurate aspect classification.

This paper is structured as follows: Section II provides the related work for the study, Section III outlines the algorithms employed for drug sentiment categorization, Section IV presents the study’s findings, and Section V highlights the contributions made by the study.

Figure 1 illustrates aspect sentiment analysis using deep learning techniques.

Fig. 1
figure 1

Deep learning based Aspect Sentiment Analysis

Using a Long Short-Term Memory (LSTM) framework for aspect-based sentiment classification involves leveraging the capabilities of LSTM networks to capture long-range dependencies in sequential data like text. Initialize word embeddings using pre-trained word embeddings like Word2Vec or GloVe, or you can also learn embeddings from scratch during training. Map each word in the vocabulary to its corresponding embedding vector. Identifying aspects in addition to sentiment classification, can train a separate model or use multi-task learning to jointly learn aspect identification and sentiment classification. After identifying aspects, we can perform sentiment classification for each aspect mentioned in the input text.

2 Related work

Several Deep Learning (DL) architectures, including CNNs and LSTM recurrent neural networks, were compared head-to-head as a benchmark in a study by Cristóbal et al. [7]. Several potential model combinations are proposed, and the impact of various pre-trained word embedding models is investigated. Although cutting-edge results have been achieved by using transformers in the NLP field, they also investigate the use of BERT with a Bi-LSTM for the analysis of sentiment in drug reviews. Their research demonstrates that using BERT yields the best outcomes but at the expense of a significant amount of training time. CNN, on the other hand, produces respectable outcomes with significantly less time spent in training.

Using Double BiGRU as a foundation, Yue et al. [8] established a Pretraining and Multi-task Learning model. The related weight of PM-DBiGRU was initially trained using data from a quick text-level drug review SA task. Target and drug review semantic representations are generated using two BiGRU networks, and an attention mechanism is then employed to obtain a target-specific representation for facet-level medication review. We then apply multi-task learning to the brief textual drug review corpus to transfer useful domain knowledge.

To improve the precision with which medications can be classified according to their efficacy, MN Uddin et al. [3] suggested utilising tokenization and lemmatization to identify relevant phrases. On the drug review dataset obtained from the UCI ML repository, four ML methods were used for binary classification and one for multiclass classification. The naive Bayes classifier, RF, SVC, and multilayer perceptron were employed for binary classification, while linear SVC was utilised for multiclass classification among the ML techniques. It has been determined how well these four classifier algorithms function by analysing the results they provide. It has been demonstrated that, of these four algorithms, the random forest performs the best. Class 2 AUC of 0.82 indicates improvement with the linear SVC technique.

KL Tan et al. [9] introduced a novel hybrid model for SA, draws on the best features of the Transformer model—represented by the Robustly Optimised BERT (RoBERTa)—and the RNN—represented by the GRU. Although the RoBERTa model’s attention mechanism makes it easier to project texts into a discriminative embedding space, the GRU model does a better job of capturing long-range correlations in embeddings and therefore solving the vanishing gradients problem. To address the problem of skewed datasets in SA, the research suggests using data augmentation with word embeddings, which can be accomplished by over-sampling minority groups. This method boosts the model’s representation capacity, enhancing its robustness and accuracy in sentiment categorization. A remarkable 94.63, 89.59, and 91.52% accuracy rates were observed in an evaluation of the proposed RoBERTa-GRU model’s performance on three well-established SA datasets (IMDb, Sentiment140, and Twitter US Airline).

MP Geetha et al. [10] conducted sentiment analysis on consumer review data with the aim of categorizing sentiments into positive and negative expressions. They utilized NB Classification, LSTM, and SVM across various models. Many existing Sentiment Analysis techniques applied to customer online product reviews suffer from low accuracy and prolonged training times. They introduced the BERT Base Uncased model, a potent DL Model, to address the above challenges in SA. They got a good and enhanced performance by using the BERT model and providing more accurate predictions differentiated to further ML approaches.

JA Kumar et al. [11] proposed a multilabel ABSA model for user evaluations of the pharmaceutical product Abilify. To begin, they use preprocessing methods to enhance the quality of our data. A bag of words (BoWs) is then used to extract the TF-IDF features. Third, features that fit both labels are chosen using a joint feature selection (JFS) technique that incorporates Information Gain (IG). Additionally, problem transformation methods, customised algorithm methods, and ensemble methods can all be used to address multilabel classification challenges. Having classified Abilify user reviews into a collection of aspect term sentiment (ATS), they investigate the issue transformation methodologies of BR, classifier chains (CC), and label Powerset (LP). NB, DT, and SVM are used as the baseline classifiers for both feature sets. Measures of performance on many labels were used to assess the effectiveness of the suggested technique.

RS Jayale et al. [12] suggested a drug forecasting COVID-19 patients based on how proteins react with each other and what drugs are available. As part of the framework, machine learning models are set up to look at the protein–protein interactions (PPIs) between some viruses and specific receptors. These PPIs are proven using biomedical simulations. These methods for classifying things agree with what we know about different physical properties based on sequences, like how amino acids are grouped, where pseudo amino acids are found, and how conjoint triads work. Finally, they test the system with several different machine-learning algorithms and show how well the proposed systems work.

M Imani et al. [13] a novel approach was presented for picking out terms that signify a facet in English user reviews of medications. Then, they used remote supervision to automatically build a training set out of sentences and phrases tagged as aspect classes in the drug domain. Based on the findings, it is clear that their technique excels over the state-of-the-art aspect extraction methods in identifying features from the test set. Achieving an F-measure of 74.4%.

P Durga et al. [14]. To properly assess the sentiments from the given datasets, they suggested a Deep sentiment model that uses an integrated method based on BERT-large-cased (BLC) for training the dataset and D-RNN for classification of the aspects. Optimization algorithms like SGD can be utilized to further refine the model. Fine-tuning entails re-training the pre-trained model on a targeted SA task to enhance performance. The deep sentiment analysis (DSA) based classification is developed to classify the sentiments based on aspect and priority model to produce better results. Down order to achieve more precise findings when compared to preexisting models, they zeroed down on aspect and priority-based SA.

S Feng et al. [15] developed the new ABSA AG-VSR model. The final categorization is proposed to be carried out using two different representations: A2GR and VSR. The GCN component takes as input a dependency tree that has been refined by the attention method and outputs a 2GR. More so, a VAE-like encoder-decoder structure learns a distribution from which VSR is sampled. They have finally gotten their AG-VSR model to perform competitively.

A. H. Sweidan et al. [16] created a hybrid ontology-XLNet sentiment analysis categorization method for sentences. Their approach aims to identify user social data using context-based sentiment inferences. They examine how extracting indirect linkages in user social data with the lexicalized ontology improves ABSA. They used the XLNet model to extract nearby contextual meaning and concatenate it with each embedding word. The authors employed Bi-LSTM networks to classify characteristics in online user reviews. Multiple indicators are used to evaluate the performance of the suggested technique on 6 drug-related datasets. When compared to prior art methods, theirs significantly outperformed the extraction of features got good accuracy.

A. H. Sweidan et al. [17] created a hybrid feature learning strategy for ABSA to detect and classify unlabelled social data. They predicted context words and learned phrase and document vectors using BERT with Latent Dirichlet Allocation (LDA). They classified extracted sentiment using Bi-LSTM. As a case study, they test their technique on different social media datasets of adverse drug reactions (ADRs). They had 95.4% average accuracy, 0.935 AUC score, and 94% F-measure.

Research Gaps:

Some of the research deficiencies noted in Established Models:

Research in drug review classification has made significant strides, but several gaps remain in the existing literature. Here are some common research gaps identified in earlier studies:

Incorporation of Contextual Information: Many machine learning models used for drug review classification treat each review as an independent instance, ignoring the contextual information provided by surrounding reviews. There’s a gap in research focusing on methods to effectively incorporate contextual information, such as review sequences or user profiles, into the classification process to improve accuracy.

Capturing Aspect-Level Sentiment: Existing machine learning models for drug review classification often focus on overall sentiment polarity (positive, negative, or neutral), neglecting aspect-level sentiment analysis. There’s a gap in research focusing on methods to accurately identify and classify sentiments related to specific aspects of drugs (e.g., effectiveness, side effects, dosage) within reviews.

Handling Noisy and Ambiguous Text: Drug reviews often contain noisy and ambiguous text, including misspellings, abbreviations, and colloquial language. Traditional machine learning approaches may struggle to handle such noise and ambiguity, leading to reduced classification performance. Research is needed to develop robust methods for preprocessing and cleaning text data to improve the quality of input features for classification models.

Model Interpretability and Transparency: Many machine learning models used for drug review classification, such as deep neural networks, are often perceived as “black boxes” due to their complex architectures and internal workings. There’s a gap in research focusing on methods to enhance the interpretability and transparency of classification models, allowing users to understand the rationale behind model predictions and trust the model's output.

Cross-Domain Generalization: Machine learning models trained on drug review data from one domain (e.g., prescription medications) may not generalize well to other domains (e.g., over-the-counter medications or herbal supplements). There’s a gap in research focusing on methods to improve cross-domain generalization, allowing models to adapt and perform well on new and unseen drug review datasets.

Addressing these research gaps can lead to the development of more effective, interpretable, and generalizable machine learning approaches for drug review classification, ultimately improving the quality of medication-related insights derived from user-generated reviews.

Combining RoBERTa, a state-of-the-art deep learning model for natural language processing, with Ant Colony Optimization (ACO), a metaheuristic optimization algorithm, in a hybrid model presents an innovative approach to overcoming challenges in drug review classification. RoBERTa eliminates the need for manual feature engineering by automatically extracting informative features from unstructured text data. This reduces the burden on researchers to handcraft features and allows the hybrid model to learn relevant representations directly from the input text, improving classification performance. RoBERTa can be fine-tuned for aspect-level sentiment analysis by training it on annotated datasets that include aspect-level sentiment labels. By fine-tuning RoBERTa on such data, the hybrid model can learn to classify sentiments associated with specific aspects of drugs (e.g., effectiveness, side effects) within reviews, enabling more granular analysis.

3 The proposed approach: aspect-based classification system

In this research, we propose a RoBERTa-BiLSTM-based aspect sentiment analysis methodology for Drug Reviews (DRs), which comprises 4 stages. ‘Pre-processing’, ‘Feature extraction’, ‘Feature selection’, and ‘Sentiment classification’. A high-level overview of the suggested method is shown in Fig. 2.

Fig. 2
figure 2

Overall Proposed Framework

The combination of RoBERTa and Bi-LSTM leverages the strengths of both models. RoBERTa provides a strong foundation for capturing contextual information and understanding language, while the Bi-LSTM enhances the model’s ability to capture sequential patterns and dependencies within the text.

This hybrid approach can be particularly effective for certain NLP tasks that benefit from both the deep contextual embeddings provided by transformer models and the sequential modelling capabilities of recurrent neural networks like LSTM. However, keep in mind that using more complex models may require more computational resources for training and inference. It's also important to fine-tune and experiment with different architectures to find the best combination for your specific classification task.

3.1 Pre-processing

Lower casing

Convert all text to lowercase to ensure uniformity and avoid treating words with different cases as distinct.

Example: “The Quick Brown Fox” → “the quick brown fox”.

Tokenization:

Break the text into individual words or tokens. This step facilitates the analysis of the text at a more granular level.

Example: “I love OpenAI” → [“I”, “love”, “OpenAI”].

Removing Punctuation:

Eliminate punctuation marks from the text since they typically do not contribute significant sentiment information.

Example: “Hello, World!” → “Hello World”.

Removing Stop Words:

Remove common words (e.g., “and,” “the,” “is”) known as stop words. These words don’t usually carry much sentiment and can be excluded to reduce dimensionality.

Example: [“I”, “love”, “OpenAI”] → [“love”, “OpenAI”].

Stemming and Lemmatization:

Stemming removes suffixes, while lemmatization transforms words into their base or dictionary form by reducing them to their root.

Stemming Example: “running” → “run”.

Lemmatization Example: “better” →  “good”.

Removing Special Characters and Numbers:

Eliminate non-alphabetic characters and numerical digits, as they may not contribute significantly to sentiment.

Example: “Hello123!” → “Hello”.

Handling Emoticons and Emoji:

Depending on the context, emoticons and emojis can carry sentiment information. Consider preserving or converting them to text.

Example:  → “happy”.

Handling Negations:

Modify the representation of negated words to capture changes in sentiment. For example, replace “not good” with “not_good.”

Removing HTML Tags:

In web-based sentiment analysis, remove any HTML tags that may be present in the text.

Example:

Original: “ < p > Hello < /p > ”.

Processed: “Hello”.

The label in this research is the rating and it is simplified from multiclass 1–10 into 3 classes:

  • 1–4—Bad review

  • 5–7—Moderate review

  • 8–10—Good review

3.2 Feature extraction

Feature extraction in sentiment analysis refers to the process of converting raw text data into a structured format that can be used as input for ML algorithms. The goal is to represent the data in a more compact and meaningful way, highlighting important aspects that are relevant to the task at hand. Aspect-based sentiment classification involves extracting features related to specific aspects or facets of a given text and then classifying the sentiment associated with each aspect. Here are two common feature extraction techniques for aspect-based sentiment classification.

Following these procedures, we used the Word2Vec we had constructed to match it with the test sentences and extract phrases expressing an aspect from the review set. The aspect-expressing phrases were extracted, and then our model was trained on the T dataset so that it could categorise the test set’s sentences into one of five categories. Specifically, at this point each expression of an element in the test set was labelled with one of the following categories: condition/reason, cost, dosage/duration, efficacy, and side effects. That's why it's possible to label a test text with many aspect tags.

Word2Vec is a feature extraction method used in sentiment analysis, where words are represented as dense vectors in a continuous vector space. Words in a particular text might have different meanings depending on the context, and these vectors record those meanings. The following is a simplified formula for obtaining Word2Vec characteristics for emotional analysis:

For a given word \({w}_{i}\) in a text document: Word2Vec \(({w}_{i})\)

For example “The Movie is excellent.”

Tokenize the sentence: [“The”, “Movie”, “is”, “excellent”].

For each word, obtain its Word2Vec embedding using a pretrained Word2Vec model:

$${\text{Word2Vec}}\left( {{\text{'The"}}} \right) \, = \, \left[ {0.{2}, \, - 0.{4}, \, 0.{7},...} \right]$$
$${\text{Word2Vec}}\left( {{\text{'movie"}}} \right) \, = \, \left[ {0.{5}, \, 0.{9}, \, - 0.{1}, \ldots } \right]$$
$${\text{Word2Vec}}\left( {{\text{'is"}}} \right) \, = \, \left[ { - \, 0.{3}, \, 0.{6}, \, 0.{2}, \ldots } \right]$$
$${\text{Word2Vec}}\left( {{\text{'excellent"}}} \right) \, = \, \left[ {0.{8}, \, 0.{7}, \, - 0.{5}, \ldots } \right]$$

The Word2Vec features for the sentence “The movie is excellent” can be represented as a matrix:

$$\begin{gathered} {\text{Features}}\, = \,0.2\,\,\,\,\, - 0.4\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.7 \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.5\,\,\,\,\,\,\,\,\,\,\,\,\,0.9\,\,\,\,\,\,\,\,\,\,\, - 0.1 \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, - 0.3\,\,\,\,\,\,\,\,\,\,\,\,\,0.6\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.2 \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.8\,\,\,\,\,\,\,\,\,\,\,\,\,\,0.7\,\,\,\,\,\,\,\,\,\,\, - 0.5 \hfill \\ \end{gathered}$$

These Word2Vec features can then be used as input to a sentiment analysis model. The model will learn to associate the semantic meaning of words with sentiment labels (positive, negative, or neutral) based on the training data.

3.3 Feature selection

3.3.1 ACO-based feature selection

Ant Colony Optimization (ACO) is a metaheuristic algorithm inspired by the foraging behaviour of ant colonies to determine the shortest path to a food source. In this simulation, ants deposit pheromones as they search for food, creating a chemical trail that attracts other ants to follow suit. The likelihood of an ant selecting a path is influenced by the concentration of pheromones along that route. Paths with higher pheromone levels are more appealing to ants, and as they traverse these paths, the pheromone intensity increases. This positive feedback loop enhances the probability that other ants will choose the same path. ACO leverages this collective intelligence to generate multiple solutions or paths, aiming to discover the optimal solution based on the reinforcement of pheromone-rich routes [18].

ACO can play a crucial role in the feature selection process within a hybrid model for drug review classification. Here’s how ACO can be applied and the rationale for its use over other feature selection methods:

The most relevant informative characteristics (words or tokens) from raw text data are selected for medication review classification. Based on significance and classification performance, ACO can iteratively pick the best collection of characteristics. ACO uses artificial ants to leave pheromone trails on quality or fitness aspects. Important features receive more pheromone since they improve classification performance. The pheromone trails left by preceding ants reinforce better feature subsets over numerous iterations. ACO balances feature selection exploration with exploitation. Ants initially select characteristics randomly or using heuristics. Ants exploit promising features by following previous ants' pheromone trails, converging into optimal or near-optimal feature subsets that maximise classification performance. In feature space, ACO searches globally and locally. Ants randomly select features or use heuristics to explore the feature space. While following pheromone trails, ants focus on high-quality elements.

Here a proposed flow chart of how the Ant Colony Optimization algorithm works will be depicted in Fig. 3.

Fig. 3
figure 3

ACO Flow chart

ACO can be used for feature selection in the context of drug review sentiment analysis. Feature selection is the process of choosing a subset of the most relevant features from the dataset to improve the performance of ML/DL models. ACO can be applied to find an optimal subset of features that best represent the data and contribute to sentiment analysis accuracy (Figs. 4, 5, 6, 7, 8, 9).

Fig. 4
figure 4

Word cloud Drug Names

Fig. 5
figure 5

Top 10 drugs with 10/10 rating

Fig. 6
figure 6

Top 10 drugs with 1/10 rating

Fig. 7
figure 7

Count of ratings

Fig. 8
figure 8

Top 10 Conditions

Fig. 9
figure 9

Representation of Sentiments

An ant will move from node i to node j with probability

$${P}_{ij}=\frac{{\tau }_{ij . {n}_{ij}^{\beta }}^{\alpha }}{\sum {\tau }_{ij . {n}_{ij}^{\beta }}^{\alpha }}$$
(1)

where τi,j is the amount of pheromone on edge i, j

α is a parameter to control the influence of τi,j

ηi,j is the desirability of edge i, j

β is a parameter to control the influence of ηij,

Amount of pheromone is updated according to the equation

$$\tau_{ij} = \left( {1 - {\uprho }} \right)\,T_{ij} + \Delta \tau_{ij}$$
(2)

where.

\({\tau }_{ij}\) is the amount of pheromone on a given edge i, j

ρ is the rate of pheromone evaporation.

\(\Delta {\tau }_{ij}\) is the amount of pheromone deposited, typically given by

$$\Delta \tau_{ij}^{K} = \left\{ {\begin{array}{*{20}c} {\frac{1}{lk} if\, ant\, k\, travels\, on\, edge\, i, j} \\ { 0\,\,\,\,\,\,\, otherwise } \\ \end{array} } \right.$$
(3)

where LK is the cost of the Kth ant’s tour

$$\tau_{ij} \leftarrow \left( {1 - {\uprho }} \right).{ }T_{ij} + \sum\nolimits_{k = 1}^{m} {i \Delta \tau_{ij}^{k} }$$
(4)

where.

ρ is the ‘evaporation rate’.

m is the ‘number of ants’.

\(\Delta {\tau }_{ij}^{k}\) is pheromone quantity laid on edge (i,j) by the kth ant’

$$\Delta \tau_{ij}^{K} = \left\{ {\begin{array}{*{20}c} {\frac{1}{lk} if\, ant\, k\, travels\, on\, edge\, i, j} \\ { 0\,\,\,\,\,\,\,\,\,\,\, otherwise } \\ \end{array} } \right.$$
(5)

where LK is the tour length of the Kth ant.

Ants in ACS use the pseudorandom proportional rule.

Each ant applies it only to the last edge traversed:

$$\tau_{{{\text{ij}}}} = \, \left( {{1 } - \, \varphi } \right) \, \cdot \, \tau_{{{\text{ij}}}} + \, \varphi \, \cdot \, \tau_{0}$$
(6)

where φ ∈ (0, 1] is the pheromone decay coefficient τ0 is the initial value of the pheromone.

Offline pheromone update equation

$$\tau_{ij} \leftarrow \left( {1 - {\uprho }} \right).{ }\tau_{ij} + \rho . \Delta \tau_{ij}^{best}$$
(7)

where

$${\tau }_{ij}^{best} = \left\{\begin{array}{c}1\backslash Lbest\, if\, best\, ant\, k\, travels\, on\, edge i,j\\ 0\,\,\,\,\,\,\,\,\,\,\, other wise\end{array}\right.$$
(8)

The best tour length Lbest can be determined either at each iteration or as a cumulative total from the best solutions identified at the beginning of the process.

3.3.2 Pseudocode for S-ACO

Initialize the no. of ants as mi where i = 1,2,3……..,n and no. of iterations as iter, no. of edges = {i to j}, Evaluate each ant fitness value, pheromone matrix \({\tau }_{ij}\) Pheromone concentration Pc, best solution is \({L}_{best}\),\({\tau }_{0}\): Initial pheromone level on all paths, α: Pheromone influence factor, β: Desirability influence factor, ρ: Pheromone evaporation rate.

figure a

In this, the algorithm iterates for a specified number of iterations. For each iteration, multiple ants construct aspect subset based on the relevance of aspects in the context of drug reviews. The ‘select_next_node’ function should guide the ant’s aspect selection based on pheromone levels and aspect desirability.

The evaluate current_solution’s performance on drug reviews step would involve assessing the quality of the aspect subset by considering how well it represents the different aspects in the drug reviews and how it contributes to understanding the aspects of the reviews.

3.4 Sentiment classification

The RoBERTa with Bi-LSTM model for aspect-based drug review classification is a deep learning architecture designed to analyse drug reviews and classify sentiments associated with specific aspects mentioned in the reviews. This model combines the power of RoBERTa, a pre-trained transformer-based language model, with a BiLSTM layer to capture both contextual information and sequential dependencies in the input data.

RoBERTa can be used for feature extraction, encoding the semantic information and contextual relationships within the drug review text. The output representations generated by RoBERTa capture rich semantic features from the input sequence. Bi-LSTM can be applied on top of the RoBERTa representations to further model sequential dependencies and capture nuanced patterns in the drug review text. The bidirectional processing of Bi-LSTM enhances its ability to capture long-range dependencies, allowing it to refine the representations generated by RoBERTa. The predictions from both RoBERTa and Bi-LSTM can be combined using ensemble learning techniques such as averaging or stacking. This ensemble approach leverages the diversity of the two models to improve robustness and generalization performance, resulting in more accurate drug review classification. By integrating RoBERTa and Bi-LSTM within a hybrid model, can leverage the strengths of both deep contextualized representations and sequential modeling techniques to achieve state-of-the-art performance in drug review classification tasks.

3.4.1 RoBERTa model

The RoBERTa (Robustly optimized BERT approach) [19] architecture is based on the transformer architecture, which was developed by researchers at Facebook AI. Like BERT, RoBERTa is a transformer-based language model that uses self-attention to process input sequences and generate contextualized representations of words in a sentence specifically designed NLP tasks. It builds upon the BERT model, introducing several modifications to enhance performance and robustness.

A significant distinction between RoBERTa and BERT lies in the scale of their training data and the effectiveness of their training procedures. Notably, RoBERTa was trained on an extensive dataset comprising 160 GB of text, surpassing the size of the dataset used for BERT by more than tenfold. We can observe from the GLUE leaderboard [20] that RoBERTa performs better than BERT. Furthermore, RoBERTa employs a dynamic masking technique in its training process and replaces the next sentence prediction to enhance the model’s capacity to acquire more resilient and versatile word representations [21].

The RoBERTa model is composed of numerous layers of transformer blocks, with each block featuring a multi-head self-attention mechanism and position-wise feedforward networks.

Embedding Layer: Tokenized input is converted into embeddings using an embedding layer. RoBERTa, like BERT, utilizes Word Piece tokenization to handle a large vocabulary effectively.

Transformer Architecture: It consists of multiple layers of transformer blocks. Each block includes:

Multi-Head Self-Attention Mechanism: Captures contextual relationships between words.

Given a sequence of input embeddings, \(X= \left[{x}_{1}, {x}_{2}, {x}_{3}, \dots \dots \dots .{x}_{n}\right]\) the self-attention mechanism calculates attention scores.

A as follows:

$$A=Softmax \left(\frac{{XW}_{q \left({XW}_{k}\right)}}{\sqrt{{d}_{k}}}\right)$$

where \({w}_{q}\), \({w}_{k}\) are weight matrices for query and key projections, and \({d}_{k}\) is the dimension of the key vectors.

The output of the self-attention mechanism, often denoted as Z, is calculated as a weighted sum of the values (V):

$$Z=A ({XW}_{v})$$

The final output of the self-attention block is typically passed through a linear layer and layer normalization.

Position-wise Feedforward Networks: Captures complex non-linear relationships within the sequence.

The output of the self-attention mechanism is processed by position-wise feedforward networks:

$$FFN\left(x\right)=ReLU \left(x{W}_{1}+ {b}_{1}\right)W2+ {b}_{2}$$

where \({W}_{1}\), \({b}_{1}\), \({W}_{2}\), \({b}_{2}\) are learnable parameters and the ReLU activation introduce non-linearity.

Masked Language Model (MLM) Loss: The objective of predicting masked tokens during pre-training is defined by the cross-entropy loss:

$${L}_{MLM}= -{\sum }_{i=1}^{n }{\sum }_{j=1}^{m}\delta {\delta }_{ij }log P({x}_{ij})$$

where n is the no. of masked tokens, m is the size of the vocabulary, and P \(({x}_{ij})\) is the predicted probability of the correct token.

Layer Normalization and Residual Connections:

Both the output of the self-attention mechanism and the output of the position-wise feedforward networks are passed through layer normalization. Additionally, residual connections are used, adding the original input to the normalized output:

Output = LayerNorm (FFN (LayerNorm (Self-Attention (Input) + Input))).

This helps with the flow of gradients during training and facilitates the learning process.

3.4.2 Bi-LSTM model

A Bi-LSTM model is a type of RNN architecture that is commonly used for sequence-based tasks such as sentiment classification. In sentiment classification, the goal is to determine the sentiment or emotion expressed in a piece of text. The key feature of Bi-LSTM is its ability to capture information from both past and future time steps in a sequence, enhancing the model's understanding of context.

Bi-LSTM Model Layers:

Input Layer: The input to the model consists of tokenized and embedded sequences. Each word or token is represented as a vector in an embedding space.

Bi-LSTM Layer: The Bi-LSTM layer consists of two LSTM components: one processing the input sequence in the forward direction, and the other processing it in the backward direction. This bidirectional processing allows the model to capture dependencies from both past and future information.

Optional Intermediate Layers: Depending on the complexity of the task, we may have additional intermediate layers, such as fully connected layers or dropout layers, to introduce non-linearity and prevent overfitting.

Output Layer: The output layer produces the final sentiment prediction.

\({x}_{t}\) represents the input at time step t.

\({h}_{t-1}\) and \({c}_{t-1}\) are the hidden state and cell state from the previous time step.

A Bi-LSTM network processes input data in both forward and backward directions.

$${\text{Forget gate}: f}_{t}=\upsigma ( {W}_{f} .[{h}_{t-1}, {x}_{t} ]+ {b}_{f}$$
$${\text{Input gate}: i}_{t}=\upsigma ( {W}_{i} .[{h}_{t-1}, {x}_{t} ]+ {b}_{i}$$
$${\text{Cell state update}: c}_{t}=tanh ({W}_{c}.[{h}_{t-1}, {x}_{t} ]+ {b}_{c}$$
$${\text{Cell state}: c}_{t}= {f}_{t} \odot {c}_{t-1}+ {i}_{t}\odot {c}_{t}$$
$${\text{Output gate}: o}_{t}=\upsigma ( {W}_{0} .[{h}_{t-1}, {x}_{t} ]+ {b}_{o}$$
$${{\text{Hidden state}:h}_{t}}^{(f)}= {o}_{t}\odot tanh({ c}_{t})$$

Similar to the forward LSTM, but with parameters

$${\text{Forget gate}:f}_{t}=\upsigma ( {{W}_{f}}^{(b)} .[{h}_{t-1}, {x}_{t} ]+ {{b}_{f}}^{(b)}$$
$$\text{Input gate}:{i}_{t}=\upsigma ( {{W}_{i}}^{(b)} .[{h}_{t-1}, {x}_{t} ]+ {{b}_{i}}^{(b)}$$
$$\text{Cell state update}:{c}_{t}=\text{tanh}({{W}_{c}}^{(b)}.[{h}_{t-1}, {x}_{t} ]+ {{b}_{c}}^{(b)}$$
$${\text{Cell state}:c}_{t}= {f}_{t} \odot {c}_{t-1}+ {i}_{t}\odot {c}_{t}$$
$${\text{Output gate}: o}_{t}=\upsigma ( {{W}_{0}}^{(b)} .[{h}_{t-1}, {x}_{t} ]+ {{b}_{o}}^{(b)}$$
$${{\text{Hidden state}:h}_{t}}^{(b)}= {o}_{t}\odot tanh({ c}_{t})$$

For getting output for the classification of sentiments we can combine both forward and backward hidden states \({{h}_{t}}^{(f)}\) and \({{h}_{t}}^{(b)}\)

\({{h}_{t}}^{(f)}\) and \({{h}_{t}}^{(b)}\) are concatenated to form the final hidden state \({{h}_{t}}^{(bi)}\) = \({[{h}_{t}}^{(f)}\), \({{h}_{t}}^{(b)}\)]


The output of the Bi-LSTM can be fed into a softmax layer for sentiment classification. The softmax function converts the output scores into probabilities for each sentiment class.


Here\({W}_{f}, {W}_{i}, {W}_{c} , {W}_{o}, {b}_{f }, {b}_{i}, {b}_{c}\), and \({b}_{o}\) are weight matrices and bias vectors for the forward LSTM and \({{W}_{f}}^{(b)} , {{W}_{i}}^{(b)} ,\) \({{W}_{c}}^{(b)}\),\({{W}_{0}}^{(b)}\), \({{b}_{f}}^{(b)}, {{b}_{i}}^{(b)} , {{b}_{c}}^{(b)}, and {{b}_{o}}^{(b)}\) are the corresponding parameters for the backward LSTM. “σ” represents the sigmoid activation function, ‘⊙’ represents element-wise multiplication, and ‘tanh’ is the hyperbolic tangent activation function.

4 Experimental results and discussion

To evaluate and validate the potency of the suggested approach, a series of experiments were conducted. These experiments aimed to assess the performance of the approach in extracting features from unbalanced data. The experimental datasets used in these experiments were curated from a compilation of DRs.

The construction of these datasets involved several steps. First, a large collection of drug reviews was obtained from various sources. These reviews were then preprocessed to remove any irrelevant information and to ensure consistency in the format of the data.

Next, a careful selection process was carried out to choose the most appropriate texts for the experimental datasets. The selected texts were those that provided valuable insights into the advantages of the drugs being reviewed. These texts were chosen based on their relevance and informativeness.

Once the selected texts were identified, they were further annotated to provide additional context and information. This annotation process involved labelling the texts with relevant attributes, such as the drug being reviewed, the specific advantage being discussed, and any other pertinent details.

After completing the annotation process, the final datasets were created by organising the annotated texts into appropriate categories. These categories were determined based on the specific advantages discussed in the texts.

Overall, the construction of the experimental datasets involved a meticulous and systematic approach to ensure the reliability and validity of the experiments. The curated datasets provided a solid foundation for assessing the effectiveness of the proposed approach in extracting advantages from unbalanced data.

4.1 Datasets

We utilized two benchmark datasets obtained from pharmaceutical websites, namely drugs.com and druglib.com, both of which provide comprehensive information about various medications to healthcare professionals and consumers. The drugs.com dataset comprises 215,063 drug reviews, each accompanied by a 10-star rating representing overall user satisfaction. For instance, a review might state, “I have been using Ibuprofen for my chronic back pain, and I must say it has been quite effective. The pain relief is noticeable, and I haven’t experienced any severe side effects.” This exemplifies a patient sharing their experience with Ibuprofen, with the 10-star rating indicating the level of satisfaction. On the other hand, the druglib.com dataset includes 4143 drug reviews, featuring a 5-star rating system for side effects (ranging from no side effects to severe side effects) and effectiveness (ranging from ineffective to highly effective). This provides a comprehensive evaluation of both the safety and efficacy aspects of the medications reviewed on the platform.

Thresholds for converting ratings to labels and dissemination of three types of drug reviews.

4.1.1 Drug review dataset (Druglib.com)

Additional Variable Information.

  • 1. ‘urlDrugName (categorical): name of drug’.

  • 2. ‘condition (categorical): name of condition’.

  • 3. ‘benefitsReview (text): patient on benefits’.

  • 4. ‘sideEffectsReview (text): patient on side effects’.

  • 5. ‘commentsReview (text): overall patient comment’.

  • 6. ‘rating (numerical): 10 star patient rating’.

  • 7. ‘sideEffects (categorical): 5 step side effect rating’.

  • 8. ‘effectiveness (categorical): 5 step effectiveness rating’.

In this dataset, you’ll find comments from actual users about the side effects, effectiveness, and safety of various medications. Additionally, comments are separated into those that focus on benefits, those that focus on adverse effects, and those that provide an overall commentary.

4.1.2 Drug review dataset (Drugs.com)

  • ‘drugName (categorical): name of drug’

  • ‘condition (categorical): name of condition’

  • ‘review (text): patient review’

  • ‘rating (numerical): 10 star patient rating’

  • ‘date (date): date of review entry’

  • ‘usefulCount (numerical): number of users who found review useful’

A patient with a certain ID buys a medication that treats his illness, and then, at a later point, the patient provides feedback in the form of a review and star rating for that medication. If other people read that review and agree that it was informative, they will likely click the usefulCount button, which will increment the variable by 1.

The UCI Machine Learning Repository is where we retrieved the Drug Review Dataset. In this dataset, patients share their experiences with various medications, describing their symptoms and providing an overall satisfaction rating out of 10. The information was compiled by spidering drug review websites on the internet (Tables 1, 2).

Table 1 Converting ratings to labels
Table 2 Dataset Statistics

For each entry, we provide the following details: the drug’s name (‘drugName’), the patient’s condition (‘condition’), the patient’s review (‘review’), the patient’s rating (out of ten) for the drug (‘rating’), the entry's creation date (‘date’), and the number of users who found the review helpful (‘usefulcount’).

In this case, the desired outcome is a prediction of the reviewer’s emotional reaction to the product. In this case, we can see that review sentiment is not provided; therefore, we must provide review sentiment to the rating before we can use the rating itself as the target variable.

Below Table 3 shows a Drug Review Dataset records from druglib.com.

Table 3 Dataset 1 details

Below Table 4 shows a Drug Review Dataset records from drugs.com.

Table 4 Dataset 2 records

4.2 Some of the examples for the positive reviews and negative reviews from the datasets

Lisdexamfetamine—“I have realized after my child started taking Vyvanse, her grades improved dramatically! She is able to concentrate and pay attention to what is going on instead of being confused. This medication has been working great for her and we have not had to up the dosage. It was very scary at first, however, it turned out to be a great decision.”− 9–10.

Drug name- Nature-Throid- “do not take this medicine without the supervision of an endroconologist. It was NOT for me”− 1–18.

The top 10 medications in the dataset with a perfect 10/10 rating are displayed as bars in the accompanying graph. Levonorgestrel has the most perfect 5-star reviews, at around 1883.

A bar chart, rated from 1 to 10, displays the top 10 medications from the dataset. The medicine with the most 1/10 ratings is miconazole, with around 767 such reviews (Table 5).

Table 5 Explanation of variables in the dataset

The shows a distribution plot for the distribution of the ratings from 1 to 10 in the data set.

The top ten health problems facing the population are displayed as bars in the accompanying graph. In this sample, ‘Birth Control’ is significantly more common than depression or pain.

This Pie Chart represents the Sentiments of the Reviews.

The gradual increase in positive ratings signifies the improving health of patients through the prescribed drugs. Concurrently, the decreasing negative ratings are a positive indication that the drugs administered to patients are proving to be effective. How many reviews are genuine as compared to the rating.

‘Good rating = positive + rating 10–6’.

‘Bad rating = negative + rating 4–1’

Fluoxetine, Gabapentin and Bupropion are the Drugs that are most useful for treating people.

Opioid analgesics, anti-anxiety and oral hypoglycemics are the top 3 Drug Classes used to treat people.

Anticoagulant, barbiturate and pitutary harmone is the Least used Drug Class prescribed for people.

4.2.1 Hyperparameter setting

The accuracy was enhanced through optimization, employing an improved simple ant colony optimization approach for hyperparameter tuning. Table 6 presents the hyperparameters measured in the suggested approach.

Table 6 Hyper-parameters setting

4.3 Evaluation metrics

To judge how well a statistical or machine-learning model performs, evaluation metrics are employed. These metrics provide insights into how well the model is doing and aid in comparing alternative models or algorithms.

A model’s overall quality, as well as its predictive and generalizability abilities, must be evaluated. Evaluation metrics offer impartial standards by which to evaluate these facets. Selecting appropriate metrics for evaluation requires consideration of the nature of the problem, the available data, and the desired conclusion.

Accuracy: The degree of accuracy can be defined as the actual outcomes' percentage of the total number of cases.

$$Accracy=\frac{No.of\, correct\, predictions}{Total\, no.of\, predictions}$$

Precision: Precision is computed by ratio of the count of true positive predictions by the total no.of instances predicted as positive by the model.

$$Precision=\frac{True\, Positives}{True\, Positives+False\, Positives}$$

Recall: The ratio of true positive results to the total number of applicable samples.

$$Recall=\frac{True\, Positives}{True\, Positives+False\, Negatives}$$

F1-score: The F1 score is the harmonic mean of precision and recall. It provides a single score that balances both false positives and false negatives.

$$F1-Score=2* \frac{Precision*Recall}{Precision+Recall}$$

4.4 Experimental analysis

In this section, we delve into a comprehensive examination of the experiments performed to assess and validate the effectiveness of the proposed approach. Furthermore, the section elucidates the methodology employed to harness its potential in extracting advantages from unbalanced data. The experimental datasets were curated from a compilation of drug reviews, and their construction is detailed as follows:

Figure 10 shows the accuracy of the proposed RoBERTa—BiLSTM approach and the results of the proposed model's classification are presented in Table 7 and Fig. 11.

Fig. 10
figure 10

Accuracy of the proposed model

Table 7 Classification Results
Fig. 11
figure 11

categorization outcomes determined by Macro F1, recall, and precision

Figure 12 illustrates the process of creating and summarizing the Bi-LSTM model. The schematic provides a visual representation of the steps involved in constructing the Bi-LSTM model.

Fig. 12
figure 12

Summary of the BiLSTM model

Tables 8 and 9 present a comprehensive overview of the analysis conducted on the rating and condition aspects associated with various drugs. The detailed information encapsulated in these tables allows for a thorough examination of how different drugs perform in the context of specific medical conditions. Table 10 displays how well the suggested model sorts each feature class into its own category.

Table 8 Analysis of Rating Aspect
Table 9 Analysis of Condition Aspect
Table 10 Assess the proposed model’s effectiveness in categorizing individual aspect classes

The below Table 11 shows the performance of two utalized datasets for the research, the proposed model was evaluated using various performance metrics. The results demonstrated notable achievements across key metrics, highlighting the efficacy of the proposed model shown in Fig. 13.

Table 11 Suggested model performance in terms of performance metrics
Fig. 13
figure 13

Proposed model efficacy for the two datasets

4.5 Comparative analysis

In this subsection, we conduct a comprehensive performance analysis of the suggested framework in comparison to various related methods for feature extraction. These methods include single-model approaches such as LDA, Word2Vec, and BERT. Additionally, we explore hybrid models, combining Word2Vec with LDA and LDA with BERT. Table 12 and Fig. 14 visually present the outcomes achieved by the proposed approach and the different comparative methods.

Table 12 Overall performance analysis of proposed and existing methods
Fig. 14
figure 14

Overall Performance comparison of the existing and proposed model

The results unequivocally demonstrate the superiority of RoBERTa + ACO + BiLSTM over other hybrid models, surpassing the performance of LDA + Word2Vec. The notable high accuracy of RoBERTa + ACO + BiLSTM signifies the enhanced capability of our proposed approach in identifying crucial features within the dataset. These findings underscore that our approach excels in feature extraction, outperforming all other models. This superiority can be attributed to its unique ability to capture significant word semantics and establish sentiment relations, thereby contributing significantly to the detection of Drugs.

Table 13 and Fig. 15 provide a thorough comparison of accuracy levels obtained through the utilization of deep learning models built on BERT and RoBERTa architectures. Through these visual aids, an intricate analysis of performance metrics is presented, illuminating the effectiveness of both models within the context of the specific deep-learning tasks being addressed. The information extracted from both the table and figure enhances our understanding of how BERT and RoBERTa models perform in terms of accuracy, offering valuable insights to guide decision-making when selecting the most appropriate model for a given application.

Table 13 Examine the results obtained from comparing BERT and RoBERTa
Fig. 15
figure 15

Comparison of BERT models with Proposed method

Table 14 serves as a comprehensive depiction of the comparative analysis conducted on datasets about other DL models, focusing on accuracy as a key performance metric. This table provides an in-depth exploration of how the proposed model stacks up against other established models on the same datasets.

Table 14 State-of-the-art comparison

5 Conclusion and future work

This paper proposes an innovative aspect-based sentiment classification model that combines the strengths of Bi-LSTM and RoBERTa architectures. The integration of these structures enhances the model’s ability to effectively classify text sentiment by capturing both local and global dependencies within sentence contexts. The model is trained and evaluated using a dataset of medical reviews collected from the UCI ML repository (druglib.com & drugs.com) TF-IDF and Word2Vec are employed for word vectorization, representing a total of 298,188 and 13,169 common tokens with word vectors across both datasets and this paper introduces a Ant Colony Optimization (ACO) in conjunction to extract the most relevant texts from social media. However, the proposed method grapples with significant challenges concerning information utilization, the conversion of extracted data into actionable knowledge, and feature creation using the ACO with RoBERTa model. The primary limitation of our model lies in its heavy dependence on data, particularly labeled data, for effective training. Hybrid models such as ours may encounter difficulties in generalizing to unseen or underrepresented aspects or sentiments, especially in domains like healthcare where annotated datasets are limited. This reliance on extensive labeled data can result in biased or inaccurate predictions, undermining the model’s effectiveness in practical applications within healthcare and similar domains. Hyperparameter tuning is performed to optimize the model. Notably, our results indicate that the performance of the proposed approach is enhanced when ACO is employed for feature extraction alongside RoBERTa, complemented by a BiLSTM classifier, resulting in an impressive accuracy of 96.78% on druglib.com and 95.02%.

The experimental results, relying on the F-measure, a composite metric encompassing precision and recall, revealed the superior efficiency of the proposed aspect extraction method compared to prior approaches. Moreover, fine-tuning BERT for aspect classification using the generated dataset exhibited superior performance when contrasted with alternative methods.

In future research endeavors, we plan to refine and augment the proposed approach by analyzing diverse content types and improving the feature extraction methodology. Furthermore, there is an opportunity to enhance word embedding approaches, given that experimental results underscore the influence of word representation on the overall accuracy of the model. Explore the integration of textual and non-textual modalities, such as user ratings or images of drug packaging, to enrich the information available for aspect-based classification. Multi-modal fusion techniques could be employed to leverage complementary sources of information.