Introduction

Progressively data generation and its use have been increased in various sectors because of the wide spread of information and communication technology. The healthcare sector is one among many sectors which produce data exponentially. These sectors employ technologies that compress enormous volumes of data per microsecond. As data volumes increase, data analysis techniques including feature extraction, rule generation, and data reduction become more important1. As a basis, many knowledge inference techniques were developed and computational intelligence is widely used to this extent. Dealing with ambiguity and imprecision in decision-making processes is computational intelligence’s core objective. Knowledge gained for information systems ought to be precise, intelligible, transparent, and visually conveyed. Machine learning is frequently used for the selection of characteristics, creation of rules, categorization, and grou** of them. Healthcare applications must retain the importance of data pertaining to a certain condition by selecting features and creating decision rules. Both selections of features and production of rules have been added based on the healthcare applications.

Besides, knowledge inference’s primary issue is choosing important attributes. Recent times have seen the use of swarm algorithms to select key features to classify a system. These techniques produce outputs that are more approximate than accurate. In the literature, a variety of swarm algorithm techniques are discussed. Engineering applications employ a variety of methods, such as Particle Swarm Optimization (PSO)2, fish algorithm3, bat-inspired algorithm4, and whale optimization algorithm5 for feature selection. However, the use of these techniques in healthcare applications is limited and not widespread.

This research effort in consideration the Red Deer (RD) optimization technique6. The RD algorithm is meta-heuristic and imitates the natural behavior of RD. It was originally presented in 2018 and has demonstrated promising outcomes in resolving a variety of optimization issues. RD optimization is a population-based algorithm that uses a herd of RD’s collective intellect to discover the best answer to a problem. However, it deviates from PSO in some ways, including how the population moves and how the leader is chosen. Moreover, it also incorporates a few PSO characteristics, including the steering of search spaces, velocity control to control population movement, and the application of a fitness function to assess the caliber of potential solutions7.

For knowledge inference, on the other hand, cutting-edge computer techniques are being developed. To begin with, a fuzzy set8 is presented to deal with uncertainty. For knowledge inference, while controlling uncertainties, soft set9, Rough Set (RS)10, and various concepts are also introduced. Despite these equivalent procedures, RS is utilized in a variety of technical and scientific domains11. The variation of RS and its applications are found in emerging areas such as science and engineering12,13. An equivalency relation is the prime concept of RS. Further, binary relations, fuzzy equivalence relations, and intuitionistic fuzzy equivalence relations have been presented as alternatives to the equivalence relation14,15. clinical information retrieval systems and knowledge inference both make use of these improvements16,17. The RS has been expanded with numerous concepts at the same time. As an illustration, the RS has been combined with several algorithms, such as the neural network18,19, genetic algorithm20, fish swarm, cuckoo search, and shuffling frog lea** algorithm21. Even yet, the literature shows that the fusion of RS with swarm optimization is quite rare\(f:(Q\times P)\rightarrow V\). If \(P=(C\cup D)\), then the information system is referred to as a decision system. It is to be noted that C is the set of conditional attributes and D is the set of decisions. An equivalence relation, IND(R), as defined in Eq. (1) is the prime notion of RS where \(R\subseteq (Q\times Q)\).

$$\begin{aligned} IND(R)=\{(q_i, q_j): f_p(q_i) = f_p(q_j)\hspace{0.2cm} \forall \hspace{0.2cm} p\in P\} \end{aligned}$$
(1)

The equivalence relation R divides the set Q into several classes which may be expressed as Q/R. Consider \(X\subseteq Q\) on which the perception is to be inferred. The approximations concerning the lower and upper, represented by \({\underline{R}}X\) and \({\overline{R}}X\), respectively, approximate the X. Equation (2) defines the lower approximation, while Eq. (3) defines the upper approximation.

$$\begin{aligned} {\underline{R}}X = \cup \{Y\in Q/R: Y\subseteq X\} \end{aligned}$$
(2)
$$\begin{aligned} {\overline{R}}X = \cup \{Y\in Q/R: Y\cap X\ne \phi \} \end{aligned}$$
(3)

There are two situations that result from the lower and upper approximation, like \({\underline{R}}X = {\overline{R}}X\) or \({\underline{R}}X \ne {\overline{R}}X\). In the first case, X is a crisp set whereas X is a rough set in the second. The boundary line objects in the latter scenario are designated as \(BN_R(X)= {\overline{R}}X - {\underline{R}}X\). Suppose there are two equivalence relations on Q, \(A\subseteq P\), and \(B\subseteq P\). The A-positive region of B is described in Eq. (4).

$$\begin{aligned} POS_A(B)=\cup _{X\in Q/B}\hspace{0.1cm}{\underline{A}}(X) \end{aligned}$$
(4)

The definition of \(k=\gamma _A(B)\), the measure of \(B's\) dependence on A, is elucidated in Eq. (5). B is independent of A if \(k=0\). Likewise, if \(k=1\), then B completely depends on A. In contrast, B is dependent on A partially and \(0<k<1\).

$$\begin{aligned} \gamma _A(B) = k = \frac{|POS_A(B)|}{|Q|} \end{aligned}$$
(5)

The notion \(\psi \rightarrow \phi \) is the common form of a decision rule. In the decision rule, conditions are denoted by the symbol \(\psi \), and the decision is represented by the symbol \(\phi \). Support, strength, and precision are three essential characteristics related to decision rules. The decision rule’s support is denoted by the concept \(S(\psi ,\phi ) = card(||\psi \wedge \phi ||)\). Likewise, the strength of the decision rule is represented as \(\sigma (\psi ,\phi ) = S(\psi ,\phi )/card(||\psi ||_{\phi })\). In Eq. (6), where \(NS(\psi , \phi )\) stands for non-support of a decision rule, the accuracy of the decisions is defined.

$$\begin{aligned} Accy = \frac{|S(\psi , \phi )|}{|NS(\psi , \phi ) + S(\psi , \phi )|} \end{aligned}$$
(6)

A quick explanation of the rough set rule making process is given. A categorical information system is analysed as part of the rule generating algorithm to obtain candidature rules. The computational procedures involved in the rough set rule creation procedure is presented below.

Algorithm
figure a

Rough set rule generation procedure.

An analytical interpretation

Consider a liver disease diagnosis system, as indicated in Table 1. Ten patients’ worth of information is included. Five symptoms of liver disease are ascites (\(p_1\)), spiders (\(p_2\)), edema (\(p_3\)), bilirubin (\(p_4\)), and albumin (\(p_5\)). It shows that patient \(q_1\) has ascites, spiders, no edema, very high bilirubin, and high albumin is classified as having liver disease. The remaining patients in Table 1 are likewise personified in a similar way.

Table 1 A sample decision table of liver disease.

We obtain \(Q/R=\{\{q_1, q_3\}, \{q_2, q_{10}\}, \{q_4, q_7\},\{q_5, q_8, q_9\}, \{q_6\}\}\) by applying equivalence relations on the features \(P = \{p_1, p_2, p_3, p_4, p_5\}\). Taking \(X = \{q_1, q_3, q_5, q_6, q_8, q_{10}\}\) into account, we obtain \({\underline{R}}X = \{q_1, q_3, q_6\}\) and \({\overline{R}}X = \{q_1, q_2, q_3, q_5, q_6, q_8, \ q_9, q_{10}\}\). \(BN_R(X) = \{q_2, q_5, q_8, q_9, q_{10}\}\) are the boundary line objects as a result. The boundary line portions, lower, and upper approximations of the RS are outlined in Fig. 1 from a broad perspective.

Figure 1
figure 1

Representation of different rough set concepts.

Convictions of red deer optimization

Since ancient times, Scotland has supported populations of Red Deer (RD). Male stags and female hinds are the two main categories of RD. This animal exhibits extraordinary behavior when it is reproducing. Stags yell often loudly during this time of year to draw female hinds. Mostly, hind prefers males who yell frequently. The notions of the mating phenomenon are the foundation of Red Deer Optimization (RDO). It is a population-based meta-heuristics algorithm in which Male RDs (MRDs) have been chosen initially. Rest is regarded as a hind. MRD begins by roaring, then they split into two teams known as commanders and stags. Together, these two teams battle for control of the harem. Additionally, the quantity of hinds is related to the roaring and fighting skills of the commanders. As a result, in the harems, the commanders have numerous hindmattings. Further, a hind is mated by the closest male stags6.

Exploration and exploitation are two stages of the algorithm’s operation. The loudness of MRD in the search space promotes local search exploitation. Likewise, the manner in which combating between stags and commanders is taken into account in local searches to provide improved solutions. In the exploring stage, harems are also created and assigned to the commanders. The commanders mating with the hinds of the relevant harems and other harem during this phase. The matting phase of the algorithm, which creates RD offspring, is another stage of this optimization35.

Let us consider a population of RD’s defined in Eq. (7). Further, the fitness of each RD is obtained using Eq. (8), where m is the number of features.

$$\begin{aligned} q^{RD} = \{p_1, p_2, p_3, \cdots , p_m\} \end{aligned}$$
(7)
$$\begin{aligned} Fitness = f(q^{RD}) = f(p_1, p_2, p_3, \cdots , p_m) \end{aligned}$$
(8)

The procedure starts with an elementary population of size n. Further, the population is categorized into MRD and Hind RD (HRD). While HRDs are thought of as diversification, MRDs have intensification characteristics in the population. Besides, the MRDs enhance their ranks using the Eq. (9), where UB refers to upper bound and LB refers to lower bound of the search space. The constants \(s_1\), \(s_2\), and \(s_3\) are the random numbers between 0 and 1 and refer to the three stages of roaring.

$$\begin{aligned} q_{new}^{MRD} = \left\{ \begin{array}{ccc} q_{old}^{MRD} + s_1\times ((UB - LB)\times s_2 + LB) &{} if &{} s_3\ge 0.5\\ &{} &{} \\ q_{old}^{MRD} - s_1\times ((UB - LB)\times s_2 + LB) &{} if &{} s_3 < 0.5 \end{array} \right. \end{aligned}$$
(9)

Further using Eq. (10), we calculate the number of commanders, where \(\alpha \in [0, 1]\) refers to a random number and \(N^{MRD}\) refers to the total number of MRDs. Similarly, \(N^{Com}\) refers to the number of commanders. Equivalently, the number of stags \(N^{Stag}\) is defined as \(N^{Stag} = N^{MRD} - N^{Com}\).

$$\begin{aligned} N^{Com} = round(\alpha . N^{MRD}) \end{aligned}$$
(10)

The fighting behavior between commanders and stags that leads to two offspring is expressed analytically using Eqs. (11) and (12) respectively. Please take note that \(b_1\) and \(b_2\) are uniformly distributed random numbers between 0 and 1.

$$\begin{aligned} q_{new1} = \frac{(Com + Stag)}{2} + b_1\times ((UB-LB)\times b_2 + LB) \end{aligned}$$
(11)
$$\begin{aligned} q_{new2} = \frac{(Com + Stag)}{2} - b_1\times ((UB-LB)\times b_2 + LB) \end{aligned}$$
(12)

Further, it develops a harem. It is a herd of hinds that a male commander captured. The Objective Fitness (OF) of the male commander determines the number of hinds in a harem. Therefore, using \(V_n = \nu _n - Max_{i}{\nu _i}\), hinds are distributed among commanders, where \(\nu _n\) is the power of the \(n^{th}\) commander and \(V_n\) is its normalized value. Using Eq. (13), the normalized power of the commander is calculated.

$$\begin{aligned} Pow_{n} = \left| \frac{V_n}{\sum _{i=1}^{N^{Com}}V_i}\right| \end{aligned}$$
(13)

Consider \(N^{Hind}\) is the total number of hinds, and \(N-{n}^{harem} = round(Pow_{n}\times N^{Hind})\) can be used to calculate the number of hinds in a harem. Furthermore, a commander uses \(N_{n}^{Harem_{mate}} = round(\delta _1 \cdot N_{n}^{Harem})\) to do the deer mating activity, where \(\delta _1\in [0, 1]\) refers to the initial parameter concerning the percent of hinds are the parents in the same harem. The offspring produced by the mating process is defined in Eq. (14), where \(c\in [0, 1]\) refers to a uniformly distributed random number.

$$\begin{aligned} q_{off} = \frac{(Com + Hind)}{2} + c\times (UB - LB) \end{aligned}$$
(14)

Moreover, a commander uses \(N_{k}^{Harem_{mate}} = round(\delta _2 \cdot N_{k}^{Harem})\) to do the deer mating activity, where \(\delta _2\in [0, 1]\) refers to the initial parameter concerning the percent of hinds are the parents in another harem. Here, k refers to a randomly chosen harem. Finally, the remained stag mates with the nearest hind. A distance function defined in Eq. (15) is used for a stag to find the nearest hind, where the distance between \(i^{th}\) hind and a stag is denoted as \(d_i\) and J refers to the dimensional space. The flow diagram of RDO is presented in Fig. 2.

$$\begin{aligned} d_i = \left\{ \sum _{j\in J}\left( q_j^{Stag} - q_j^{Hind_i}\right) ^2\right\} ^{1/2} \end{aligned}$$
(15)
Figure 2
figure 2

Flow diagram of RDO algorithm.

Overview of proposed research

This section outlines the four-phased research design that is anticipated. At the early stage, a medicinal record system for hepatitis B is gathered. The medicinal record system demonstrates that independent of decisions, the conditional parameter values of different patients hold the same. Physicians’ differing opinions are the main reason and it ultimately results in uncertain data analysis. As clinical information systems involve uncertainty, uncovering hidden information can be difficult. As a result, when analyzing data, it is crucial to cope with ambiguous and partial information in classification. Hence, the main goal of this phase is to remove missing data and noise. The proposed Rough Set Red Deer Optimization (RSRDO) algorithm is used to further examine the processed decision system in the next phase. It aids in locating the prime features that influence the decision system. In the third phase, RS is used to develop the clinical information retrieval system. The decision rules are further validated during the fourth phase of the research design. Figure 3 displays the planned research’s block diagram.

Figure 3
figure 3

Graphical view of proposed research.

In order to infer knowledge and develop a clinical information retrieval system, this study uses an integrated data analysis procedure that combines RS and Red Deer Optimization (RDO) algorithms. The proposed RSRDO controls the uncertainties that occur in the clinical decision system. Another goal of this study is to achieve high accuracy of classification with less number of conditional features. The medicinal record system for hepatitis B disease is used to build a clinical information retrieval system using the projected integrated technique RSRDO. We assume that a deer will find the best parameter subset given a binary bit string of length m. In this instance, m indicates the conditional parameters in the medicinal record system. If one of the component values in the solution vector is 1, the related conditional attribute is chosen. Similarly, if the component value is 0, the conditional attribute is not chosen for develo** a clinical information retrieval system. Moreover, the fitness of each solution vector is determined using a fitness function as stated in Eq. (16).

$$\begin{aligned} Fitness\hspace{0.1cm} f(q) = \alpha \gamma (q) + \beta \left( 1 - \frac{m_{s}}{m_c}\right) \end{aligned}$$
(16)

In Eq. (16), the degree of dependency is referred to \(\gamma (q)\) as described in Eq. (5). The terms “\(m_s\)” and “\(m_c\)” stand for selected parameters and total parameters correspondingly. The notion \(\alpha \) refers to the degree of dependency whereas \(\beta \) refers to the weight of other parameters considered. It is to be noted that \(\alpha + \beta = 1\). Besides, the value of \(\alpha \) must be high. The fitness function identifies the most relevant attributes pertaining to the disease hepatitis B. The procedure of the suggested RSRDO algorithm is defined in the earlier section.

The clinical information retrieval system for hepatitis B divides the condition into two groups: acute and chronic. This exploratory research uses the proposed RSRDO technique to find the important features, and RS to create a clinical information retrieval system. The clinical information retrieval system is further developed using RS based on the important feature values. So, the decision rules produced by using the integrated approach assist doctors in making the correct decision. Simultaneously, it saves the life of a person by saving time, and money.

The primary conditional features in the hepatitis B disease decision system are established using the RSRDO technique. Further, irrelevant features are eliminated from the decision table and decision rules are generated. It is also checked with domain experts that, the reduced decision system is suitable for building a clinical information retrieval system. The reduced medicinal record system is partitioned into two sections known as the training section and the testing section. A total of 70% of the data are in the training section, while 30% are in the testing section. The generation of RS decision rules is applied to the training section. Using Eq. (6), each decision rule’s accuracy is calculated. Moreover, each decision rule’s support, non-support, and strength are calculated. For the creation of a clinical information retrieval system, a threshold of 65% is considered. The decision rule whose accuracy is less than 65% is discarded.

The clinical information retrieval system is further examined using 30% of testing data in the validation phase. Various measures like recall (Recl.), precision (Precn.), accuracy (Acc.), and F-score are considered for obtaining the accuracy of the model. The F-score is computed to balance precision, and recall, and to analyze uneven data classification. These numerous measures are defined mathematically using Eqs. (17), (18), (19), and (20) respectively. It employs the terms \(t^p\), \(f^p\), \(t^n\), and \(f^n\) for true positive, false positive, true negative, and false negative respectively.

$$\begin{aligned} Recall\hspace{0.1cm} (Recl.) = \frac{|t^p|}{|t^p + f^n|} \end{aligned}$$
(17)
$$\begin{aligned} Precision\hspace{0.1cm} (Precn.) = \frac{|t^p|}{|t^p + f^p|} \end{aligned}$$
(18)
$$\begin{aligned} F-score = 2 \times \left( \frac{Precn. \times Recl.}{Precn. + Recl.}\right) \end{aligned}$$
(19)
$$\begin{aligned} Accuracy\hspace{0.1cm} (Acc.) = \frac{|t^p + t^n|}{|t^p + f^p + t^n + f^n|} \end{aligned}$$
(20)

Experimental research on hepatitis B

In this section, a clinical investigation of hepatitis B is described. The hepatitis B virus (HBV) is the cause of hepatitis B, a deadly liver illness. It significantly affects the state of world health. It can lead to persistent illness and significantly increases the risk of cirrhosis and liver cancer-related death. It is found in the liver. In addition to managing blood sugar levels and detoxifying the body, the liver also manages digestion, energy production, glycogen storage, and other bodily functions. Cells in the liver tissue are impacted by hepatitis, which compromises their functionality. Hepatitis A, B, C, D, and E are only a few of the several types. Hepatitis B is the most common liver infection, nonetheless, in the entire world. Using razors that have been used by an infected person, injecting drugs using an infected syringe, and contact with infectious bodily fluids are the main ways it is disseminated. Some of the common symptoms include jaundice, fever, skin itching, lack of appetite, weakness, ascites, abnormal blood clotting, dark urine, headache, pale stools, joint pain, and stomach bleeding. Figure 4 displays the hepatitis B signs and symptoms.

Figure 4
figure 4

Symptoms of hepatitis B.

The features of the hepatitis B disease and its categories are listed in Table 2 below. Information for this data set was gathered from the UCI repository36. Additionally, medical records from some primary health centers in West Bengal, India, are taken into account for the analysis. It has one decision parameter \(a_d\) with 19 conditional features \(p_1, p_2, \ldots , p_{19}\). These conditional features are further categorized taking assistance from expert physicians. For instance, four groups of alk phosphate have been categorized: 26–96; 96.1–147; 147.1–194; and 194.1–295. These groups are nominated as 1, 2, 3, and 4 respectively for analysis. However, the data analysis is unaffected by this representation.

Table 2 Features of hepatitis B and its categorization.

The hepatitis B medicinal records are divided into two classifications, chronic and acute, according to the judgment. In this experimental study to create a clinical information retrieval system, an integrated RSRDO approach is used. Based on the values of the conditional features, a specific judgment is taken into consideration for each patient. Therefore, using the RSRDO feature selection algorithm and rough set is crucial to produce decision rules. This aids the clinical information retrieval system in making a preliminary diagnosis of an illness. A sample medicinal record system is illustrated in Table 3.

Table 3 Medicinal record system of hepatitis B disease.

Result analysis of proposed model

The investigation is conducted using a computer system with an Intel Core i5-4200U CPU running at 1.60 GHz and 2.30 GHz, Windows 10, and 32GB of RAM. The evaluation of the investigation is done using Python. Furthermore, the proposed RSRDO procedure is used to find the significant features in the data of 643 patients. A total of 1000 iterations are taken into account for the investigation. Besides, we have considered 10 runs and each feature’s significance is calculated. Further, we have computed the average of all the runs to get the accuracy of each feature. The significance of each feature for all 10 runs is presented in Table 4. The primary characteristic is the total number of characteristics with significance values over the trend line. Figure 5 exhibits the feature’s significance relating to the proposed RSRDO algorithm. Nine features in all have been chosen for analysis. Gender (\(p_2\)), steroid (\(p_3\)), fatigue (\(p_5\)), anorexia (\(p_7\)), palpable spleen (\(p_{10}\)), histology (\(p_{14}\)), bilirubin (\(p_{15}\)), SGOT (\(p_{17}\)), and albumin (\(p_{18}\)) are the indicated features. The information system also removes other features like age (\(p_1\)), antivirals (\(p_4\)), malaise (\(p_6\)), liver big (\(p_8\)), liver firm (\(p_9\)), spiders (\(p_{11}\)), ascites (\(p_{12}\)), varices (\(p_{13}\)), alk phosphate (\(p_{16}\)), and protime (\(p_{19}\)).

Table 4 Significance of characteristics in each run.
Figure 5
figure 5

Significance of characteristics of hepatitis B.

The reduced medicinal record system is split into 70% (450) training data and 30% (193) testing data for building a clinical information retrieval system making use of the RS. The training set includes 137 acute instances and 313 chronic instances of hepatitis B. The RS technique is imposed to investigate these 450 training data for generating decision rules. The hepatitis B decision system’s decision rules, which were created from training data, are shown in Tables 5 and 6 respectively. Decision rules that have an accuracy rate of less than 65% are discarded as candidate rules. Furthermore, these decision rules are validated by making use of 193 testing data.

Table 5 Training decision rules of hepatitis B confining proposed RSRDO.
Table 6 Training decision rules of hepatitis B confining proposed RSRDO (continued).

It shows from the data used for training, 67 decision rules were generated. It is evident from Tables 5 and 6 that there are 5 rules that are less accurate than the specified conception value 65%. Hence, these 5 rules are removed. Further, 62 decision rules are analyzed using 193 (30%) medical records. It includes 84 records of acute cases and 109 records of chronic cases. The testing analysis is presented in Tables 7 and 8 respectively.

Table 7 Testing decision rules of hepatitis B confining proposed RSRDO.
Table 8 Testing decision rules of hepatitis B confining proposed RSRDO (continued).

Tables 7 and 8 show that the rules 3, 9, 10, 19, 29, 32, 44, 60, and 62 are removed because of accuracy lower than 65%. It is evident that decision rules are reduced by 14.52% as a result of the testing study. The confusion matrix is also developed in order to assess the correctness of the suggested RSRDO procedure. The confusion matrix for the RSRDO procedure is shown in Table 9. It is seen that the RSRDO procedure has acquired a 91.7% accuracy level.

Table 9 Performance measure of RSRDO over hepatitis B disease.

Comparison analysis

This section conducts a comparison of the RSRDO procedure with the Decision Tree (DT) procedure, RS procedure, and Red Deer Optimization—Decision Tree (RDODT) procedure. The individual comparison analysis is shown in the subsection that follows.

Comparison analysis with RS

Making use of 450 patient records of the training data, RS data analysis is performed, which takes into account all hepatitis B features. 34 decision rules are generated from the RS data analysis37. Since every rule has a score of at least 65%, it is regarded as a candidate rule. The decisions created using the RS are shown in Table 10. It shows that the RS model produces 6% more rules than the RSRDO model.

Table 10 Training decision rules of hepatitis B confining RS.

Making use of 193 patients’ data from the testing dataset, the 34 decision rules are examined further in detail. All the decision rules have an accuracy of more than 65% and hence are selected. Moreover, the confusion matrix of the RS model, which takes into account all classes, is computed and shown in Table 11. Table 11 demonstrates that the RS model provides an accuracy of 88.9%. However, the RSRDO predictive accuracy is 91.7%. Because of this, the proposed RSRDO approach offers 2.8% more accuracy than the RS model. Figure 6 shows different measurements for the two models, RSRDO and RS to make them easier to understand.

Table 11 Performance measure of RS over hepatitis B disease.
Figure 6
figure 6

Measure of performance of RSRDO and RS.

Comparison analysis with DT

The outcomes of the projected RSRDO approach and the Decision Tree (DT) approach have been compared in this Sect. 38. The DT approach is used to produce the decision rules while taking into account all 19 characteristics. DT algorithm is used to attain an accuracy of 82.9%. Figure 7 displays the decision rules that the DT procedure generated. It follows that the RSRDO model is 8.8% more accurate than the DT procedure. Similarly to this, the RS procedure is 6.0% more accurate than the DT procedure.

Figure 7
figure 7

Decision tree approach decision rules.

Comparison analysis with RDO-DT approach

The outcomes of the projected RSRDO approach and the RDO-Decision Tree (RDODT) procedure have been compared in this section. DT approach is used to construct the decision rules while taking into account the chosen RDO approach characteristics. Using the RDODT technique, an accuracy of 88.6% is attained. The decision rules produced by the RDODT procedure are shown in Fig. 8. Consequently, it is evident that the DT approach has an accuracy of 5.8% lower than the RDODT procedure. In a similar vein, the RDODT model has 0.3% less accurate than the RS approach. However, the proposed RSRDO procedure is 3.1% more accurate than the RDODT procedure.

Figure 8
figure 8

RDO—decision tree approach decision rules.

A comparative analysis of all models relating to recall, precision, f-measure, accuracy is presented in Table 12. From the analysis, it is clear that the proposed model RSRDO performs better across all the measures.

Table 12 Comparative performance measure of all models over hepatitis B disease.

Research contributions and limitations of the study

This section highlights the research contributions and limitations of this research work. In this study, the following contribution has been made.

  1. 1.

    To help doctors diagnose hepatitis B illnesses, a novel clinical information retrieval system that combines the RS and RD algorithms has been outlined.

  2. 2.

    Using a hepatitis B clinical information system, a novel feature selection approach that integrates RD optimization and the degree of dependency of the RS is given and examined.

  3. 3.

    The advocated RSRDO model’s experimental effectiveness is assessed over the assessment of hepatitis B.

  4. 4.

    In terms of accuracy, the suggested approach RSRDO is also contrasted with the RS, DT, and RDODT models. It is discovered that RSRDO outperforms other models despite having the fewest features in the decision system.

  5. 5.

    In comparison to the RS model, the suggested model RSRDO produces 55.9% more decision rules while achieving a high accuracy of 91.7%.

Limitations of the experimental research

A clinical information retrieval system is developed by the integration of an RS and red deer algorithm in the suggested research study. Qualitative data are supported by the RS data analysis. Thus, with the assistance of specialized professionals, an attempt has been made to convert the information system’s continuous data values to subjective information. Without consulting specialized professionals, a fuzzy RS could be able to handle this more effectively. Similar to this, the RSRDO algorithm does not balance local and global search in feature selection. It is because, loudness of MRD in the search space promotes local search exploitation. Likewise, the manner in which combating between stags and commanders is taken into account in local searches to provide improved solutions. These two represent the main research work limitations that may be solved in further studies.

The proposed algorithm RSRDO is a meta-heuristic algorithm. On employing the data partition approach, the problem can be scaled to a larger datasets. Besides, RS supports parallel processing also. All meta-heuristics algorithms never provides optimal solution to all problems. In general the meta-heuristic algorithms are problem specific. So, the proposed RSRDO algorithm may not provide optimal solution to all the problems. It can be studied in future research. At present, it can be considered as a limitation.

Conclusions

Each short while, more and more healthcare data are being gathered. Data often exhibits uncertainties, which is a frequent feature. It takes a lot of work to analyze such data and produce any useful information. In order to achieve this, this work introduces the RSRDO clinical information retrieval system, which combines RS and RDO for disease diagnosis. Over the hepatitis B diagnosis system, the integrated RSRDO clinical information retrieval system is being examined. Additionally, the classic RS, DT, and RDODT models are contrasted with the clinical information retrieval system RSRDO. The suggested RSRDO procedure outperforms all mentioned procedures with an accuracy of 91.7%. For the RS, DT, and RDODT procedures, the obtained accuracy is 88.9%, 82.9%, and 88.6%, respectively. The investigation shows that the RSRDO procedure is 2.8% more accurate than the RS procedure. Likewise, the RSRDO procedure is 8.8% more accurate than the DT procedure and 3.1% more accurate than the RDODT procedure. Furthermore, 55.9% more rules are generated by the suggested RSRDO model than by the conventional RS model.

This research has mainly several practical advantages. The projected clinical information retrieval system uses minimum number of features for the diagnosis of hepatitis B. The helps in reducing the cost of treatment for patients while diagnosis of hepatitis B. Besides, the clinical information system generates more number of rules with less number of features. These rules help the physician to look in deeper for the hepatitis B diagnosis. As a result, the disease can be detected at an early stage and it can save the life of a patient. The findings obtained through this research work, it is anticipated, will help doctors choose the best course of action.

In the information system, it is discovered that several features have continuous values. With the assistance of discipline experts, these continuous values of the features are classified. Therefore, improved accuracy could result from the hybridization of fuzzy RS and RDO. Furthermore, the decision rules acquired can be subjected to formal concept analysis to identify the primary influencing elements. It is intended as a future line of research.