Keywords

1 Introduction

Information and communication technologies are increasingly being deployed in the medical domain to support various activities on all sides of the relevant processes. This situation is also related to the ever-increasing amount of data that needs to be processed and analyzed. For doctors, it´s hard to consider a higher volume of data in the diagnosis procedure, or in determining the right treatment.

The evidence of the World Health Organization (WHO) from 2015 indicates that up to 5% of patients had an incorrect diagnosis. The diagnostic procedure represents a complicated process, in which it is essential to have the right information available for the right people at the right time. If analytical support models are available for doctors, they will help them consider all the contexts and important hidden patterns in the data. They will also reduce the time needed to make a decision. The input and continuous participation of the experts is an essential part of the analytical projects. In some cases, we can capture and store the expert’s knowledge in a suitable formal way.

Many researchers apply various intelligent techniques to create decision support systems or models to help the doctor determine the correct patient diagnosis and enable them to design the best treatment for their current health condition.

Case-based reasoning (CBR) approach was proposed by Schank in 1982 to solve identified problems related to the decision support systems like knowledge elicitation, adaptation, or maintenance [1]. Since medicine requires experts with a mixture of knowledge and experience, case-oriented methods should be very efficient, mainly because reasoning with cases corresponds with the typical decision-making process of physicians. Also, incorporating new cases means automatically updating parts of the changeable knowledge [2].

In our previous research works focusing on the diagnostics of the various diseases like metabolic syndrome, mild cognitive impairment, heart or brain attacks, we typically extracted different models and hidden knowledge through suitable analytical methods. In most cases, these were relatively small samples with records up to 500. It means that the adaptation of the generated decision rules to new examples was simple, e.g., we spent some time re-generate the new ones. However, in this paper, we want to focus on a different situation, when the knowledge base contains a large number of rules, and it is important to decide about possible updates effectively.

The paper is organized as follows: the first section introduces our motivation and the topic, the second one presents the Case-based reasoning in the medical domain, identified challenges and new proposed approach for adaptation phase. The conclusion summarizes the main points and outlines future work.

1.1 Case-Based Reasoning

The CBR methodology has attracted significant attention because the basic idea of reusing experience to solve previous problems is a powerful and often used way of addressing people’s issues. In CBR terminology, a case usually means a problem situation that one needs to resolve.

Doyle et al. in 1998 [3] described the CBR as a problem-solving paradigm in many ways significantly different from other major artificial intelligence approaches. But the situation has changed during the last years. Computational analogy-making and CBR are closely related areas. Analogy-making involves at least several subprocesses like building representation, retrieval from a base for the analogy, map** onto the target, validation, and learning from the experience [4]. Other approaches rely only on general knowledge of the problem area, but CBR can use specific knowledge about previous problematic situations [5]. CBR reasoning has an incremental character [6]: it means that whenever the problem is resolved, the new experience will be retained and immediately available for future use of problem-solving.

Therefore, solving each problem in the CBR cycle consists of 4 phases such as retrieve, reuse, revise, and retain. We identified several models in existing literature like the Hunt model, the Allen model, the model by Kolodner and Leake [7], and the R4 model proposed by Aamodt and Plaza [8]. The R4 model is one of the most used and defines the CBR cycle with the following four primary steps.

Finding and retrieving the most similar cases is done at the RETRIEVE step. According to several authors [9, 10], this phase is one of the most important. It includes a case-finding process based on their similarity. For this purpose, we can typically use the nearest neighbor search, inductive approaches, knowledge approaches, Bayesian network, clustering Euclidian distance, or other similarity measures. In many practical applications, it is often difficult to distinguish the REUSE and REVISE steps because many researchers associate them into one phase called ADAPT (adaptation) [11]. In this step is the case used again, and the proposed solution is checked. In the last step - RETAIN - is performed preservation (storage) of the learned case for future use. There are several approaches to achieving this goal like retaining only the solution of the previous problem or the new one. In many cases, this retention process leads to uncontrolled growth of the case’s base, which consequently causes the system performance to deteriorate in terms of speed [12].

2 Case-Based Reasoning in the Medical Domain

CBR reasoning process is medically accepted and getting increasing attention from the medical domain. In 1988, 1989 and 1991 were organized three CBR seminars by the American Defense Research Agency (DARPA), which officially marked the beginning of CBR discipline.

The authors in [13] present a summary of 21 studies that dealt with medical CBR systems. They described a list of methods used in each CBR step and the success rate in system verification. In the RETRIEVE phase, the most used methods are Euclidian distance, nearest neighborhood, similarity function, and weight set ranked by a decision tree. In the REUSE phase, authors used neural nets, fuzzy rules, stepwise regression, manual reuse, but most systems do not use any technique. The REVISE phase performed either manually, or do not use any specific method. In RETAIN phase are cases stored manually or not at all. The most used evaluation methods are k-fold cross-validation, leave-one-out strategy, conditional probability, AUC curve, statistical frequency, and correlation.

Many studies [11, 14] have attempted to investigate existing medical CBR systems since 1987. The most systems were developed to solve a specific disease; most systems perform as prototypes and not as the final product. Another visible trend was the successful hybridization of CBR with various computational methods. According to [11], in 32 systems out of 76, CBR was used in combination with other techniques. Also, out of 76 systems, in 51 systems, automatic adaptation is completely avoided, so they only work as retrieval systems.

The use of CBR in the medical field is currently reviving. The knowledge base of medical knowledge is continually changing; sometimes, there is more than one solution; doctors have different approaches and medicines. The fact that the CBR system methodology very much resembles the doctor’s thinking process suggests the successful use of CBR in medicine [15]. The main advantage of CBR in this field is the possibility of adapting the knowledge base [16], which is a significant aspect of decision making in the medical field.

2.1 Existing Limitations

Although the use of the CBR method appears to be successful, there are some limitations. In the medical domain, the number of similar cases is often extremely high, and this fact causes a complex generalization [14]. High memory/storage requirements and time-consuming retrieval accompany CBR systems utilizing large case bases and can take significant processing time to find similar cases in case-base. CBR systems have problems with handling noisy data. Unsuccessful assessment of such noise may result in the same problem being unnecessarily stored numerous times. In turn, this implies inefficient storage and retrieval of cases. The number of systems using the full CBR cycle (retrieve, adaptation, retain) is still very low. However, the most critical problem in the successful implementation of CBR techniques in medical systems is the problem of the adaptation step. D’Aquin et al. [16] note that this step is a relatively complex process because it has to address the lack of relevant patient information, the usability, consequences of the decision, the proximity of decision thresholds and the need to consider patients in different ways. Schmidt et al. indicate that introducing the adaptation step in the CBR system was a challenging step in medicine [17]. Most CBR systems which don´t apply the adaptation step, can´t solve some new problems, and thus their accuracy is unconvincing in critical areas [5]. The adaptation phase is, therefore limited to planning tasks [18].

2.2 Adaptation Step Problem

The study [13] mentioned that medical CBR systems solve the problem of adaptation in two ways. Most systems avoid an adaptation problem by applying only the RETRIEVE step in the CBR cycle, while others are trying to resolve it. One of the first medical expert systems CASEY [19] attempted to solve the adapting problem through rule-based domain theory. Knowledge acquisition is a barrier to the development of rule-based systems; therefore, the development of adaptation rules has never become a successful technique in medical CBR systems [17]. Some of the newer systems successfully used adaptation using computational techniques, e.g., e**T * CBR.v2 [20] revises and reuses cases using genetic algorithms; EquiVox developed by Henriet et al. [21] performs adaptation using artificial neural networks.

The studies [5, 6] solved the adaptation problem by the creation of a hybrid CBR system integrating CBR (case) and RBR (rule) reasoning. This system automatically applied the adaptation process using adaptive rules.

In the study [5], after the resolution of the new case, the knowledge base was expanded, and the adaptation and reasoning rules were updated. To achieve integration into REUSE step was added a new process called REASON, which applied the reasoning rules to get a solution if the REUSE and ADAPT process failed to find a solution. They first applied the CBR and after that, RBR to the available data. The authors used multiple cross-validations to evaluate accuracy. The developed prototype achieved an average accuracy of 99.53% on the diagnosis of thyroid disease and 99.33% on breast cancer diagnosis (accuracy by other systems ranged from 80% to 97%).

The study [6] provided a hybrid system to help healthcare professionals in early diagnose on cancer patients. In the proposed approach, CBR was used as the primary reasoning process, and RBR was used to improve part of the process. For this research, they gathered real data about patients with gastrointestinal cancer. To evaluate accuracy, they also used multiple cross-validations. The results showed increased diagnosis accuracy by 22.92% compared to the use of a single CBR method.

Salem and El Bagouras [22] have proposed a hybrid adaptation model that combines transformational and hierarchical adaptation techniques with artificial neural networks and factors for the diagnosis of thyroid cancer. Zubi and Saad [23] used combined data mining techniques with neural networks for early diagnosis of lung cancer. For the diagnosis of breast cancer, Keles, Keles, and Yavuz [24] used neuro-fuzzy rules, while Sharaf-el-Deen et al. [25] introduced a hybrid approach that also combined CBR and RBR reasoning.

We can see that authors tried to solve the adaption problem in three ways: avoiding the adaptation problem by using CBR systems only for RETRIEVE step; the use of computational techniques such as genetic algorithms and artificial neural networks; creating a hybrid CBR system that integrates CBR and RBR reasoning.

2.3 New Proposal How to Support the Adaptation

Figure 1 presents graphically our approach on how to support the adaptation phase in the CBR cycle. The assumption is a list of decision rules generated by suitable machine learning algorithms stored in case base: IF conditions THEN consequences (target value, expected diagnosis).

Fig. 1.
figure 1

The new approach for adaptation

The CBR cycle starts with the RETRIEVE step as a response to a new example without target diagnosis. The new case is compared with existing ones from the case base by an inference mechanism. We will calculate the distances between cases with similarity metrics like Euclidean, Manhattan, or Hamming distance. The result of the comparison can be one of the three alternatives:

  1. 1.

    The mechanism will find an identical case to the new one. The target diagnosis will be the same as for the existing one.

  2. 2.

    The mechanism will not find a match; all stored cases are significantly different. This situation requires re-generate the current rules based on the original set of records extended by the latest case classified by the expert.

  3. 3.

    The mechanism will find partly similar cases with different target values. Therefore, the CBR cycle will continue with other steps like REUSE, REVISE, and RETAIN.

Before the cycle continues, we will investigate the differences:

  • If the cases differ only in one condition (parameter) on the left side of the rules, the expert will consider possible adjusting of it. After several iterations, we will be able to create a separate knowledge base with the knowledge from the experts and will be able to do this step in a semi-automatic way. An example:

Formal scenario:

New case::

IF parameter1 = X AND parameter2 = Y

Decision rule::

IF parameter1 <Z, V> AND parameter2 <K, L>

THEN target value = 1

Comparison::

X ∈ <Z, V> AND (Y < K OR Y > L)

The cases differ in one parameter (parameter2). Therefore, the expert considers the following adaptation of the stored rule, and the new case will be classified as a positive diagnosis (REUSE-REVISE-RETAIN).

Adapted rule 1::

IF parameter1 < Z, V > AND parameter2 <K, Y>

THEN target value = 1

Adapted rule 2::

IF parameter1 < Z, V > AND parameter2 <Y, L>

THEN target value = 1

Specific scenario:

New case::

IF LDL = 1.8 AND HDL = 4.6

Decision rule::

IF LDL ∈ <1.5, 3.1> AND HDL ∈ <3.1, 4.5> THEN MCI = 1

Adapted rule::

IF LDL ∈ <1.5, 3.1> AND HDL ∈ <3.1, 4.6> THEN MCI = 1

If the system will find several partially similar cases with different decision rules, it is possible to assign the weights by the experts expressing their suitability. This part of the concept will be an objective of further research.

  • If the cases differ in multiple parameters:

    1. 1.

      We identify a list of different parameters.

Formal scenario:

New case::

IF parameter1 = X AND parameter2 = Y AND parameter3 = Z

THEN target value = 1

Decision rule::

IF parameter1 <A, B> AND parameter2 <C, D> AND parameter3 <E, F>

THEN target value=1

Comparison::

X ∈ <A, B> AND (Y < C OR Y > D) AND (Z < E OR F > D)

These cases are different in parameter2 and parameter3.

  1. 2.

    For each of these parameters, we calculate a difference with existing cases with suitable similarity metric. Next, the expert will help us to allocate weights by importance for particular differences.

  2. 3.

    The parameters with high weights will be adapted to the most similar case, and the target class will be determined (REUSE-REVISE-RETAIN).

Specific scenario:

New case::

IF LDL = 1.8 AND HDL = 4.6 AND BMI = 34 THEN MCI = 1

Decision rule::

IF LDL ∈ <1.5, 3.1> AND HDL ∈ <3.1, 4.5> AND BMI ∈ <25.1,

29.3> THEN MCI = 1

Adapted rule 1::

IF LDL ∈ <1.5, 3.1> AND HDL ∈ <3.1, 4.6> AND BMI ∈ <25.1,

34> THEN MCI = 1

etc.

3 Conclusion

The CBR methodology has attracted significant attention because the basic idea of reusing experience to solve previous problems looks very attractive. It can use specific knowledge about past problematic situations solving. The number of medical systems using the full CBR cycle (retrieve, adaptation, retain) is still very low. The most critical issue is the successful adaptation step. We propose a new concept to solve this issue. We found the inspiration in the research of professor Holzinger research group called interactive machine learning (iML) with a human-in-the-loop. This approach leads to algorithms that can interact with both computational agents and human agents and can optimize their learning behavior through these interactions [26, 27].

For this purpose, we use a combination of data analysis methods and CBR extending by communication with an expert, which helps us determine the importance of the parameters, their settings and the determination of the suitable adaptation.

In future work, we will focus on experimentally testing and verification of the proposed approach on the available medical data samples.