Introduction

Knowledge distillation has shown promising results for classifying skin lesions in dermatology images [1,2,3,4,5]. Knowledge distillation encompasses three types of knowledge: logits-based knowledge, intermediate feature-based knowledge, and relationship-based knowledge [6]. Logits-based knowledge distillation [7, 8] enables the student to mimic the teacher's soft output by adjusting the temperature parameter. Knowledge transfer through logits is accomplished by minimizing the Kullback–Leibler (KL) divergence between the student’s and teacher's final outputs. Intermediate feature-based knowledge distillation [9,32] proposed a weighted soft-label distillation framework, WSLD. This framework assigns a dynamic weight to the distillation loss to determine the extent to which the teacher's soft-label information can be utilized based on the cross-entropy loss of both the student and the teacher. However, a student's learning is a sequential process and will encounter varying difficulties at different stages. The WSLD framework only mines logits-based knowledge, neglecting the representational knowledge inherent in each stage and failing to utilize intermediate feature-based and relationship-based knowledge. Meanwhile, this method cannot avoid noise interference from the teacher. Innovations: This research introduces AdaBoost and a variational difficulty mining strategy (VDMS) to knowledge distillation and proposes a distillation framework called Variational AdaBoost Knowledge Distillation (VAdaKD). The framework aims to help the student determine the “granularity” of mining the teacher's knowledge by considering the learning difficulties of skin lesion classification in dermatology images. Specifically, we apply AdaBoost to treat all stages of the student model as a sequential learning process. We also introduce an intermediate auxiliary classifier for each stage, where the weights of input samples from each stage correspond to the degree of learning difficulty. The paper proposes actively leveraging the teacher's knowledge based on the student's learning difficulty at each stage, facilitating targeted knowledge transfer. Finally, the paper performs linear weighting based on the weights of the base classifiers to obtain the final distillation loss. In this research, we incorporate three forms of knowledge into the distillation loss: logits-based, intermediate feature-based, and relationship-based, respectively. However, as the student's learning process progresses, it becomes increasingly challenging to identify the degree of learning difficulties in subsequent learning stages due to noise interference from the teacher. Therefore, we first adopt the idea of GCN to construct the nearest-neighbor relationship matrix \(A\). This matrix helps us calculate the information of the current node's \(l\) th hop and then recount it to the student. The goal is to enable the student to perceive the nuanced classification difficulties by leveraging the multi-hop information among dermatology samples while maintaining the same nearest-neighbor relationship with the teacher. Next, we eliminate noise interference from these nuanced difficulties by maximizing the mutual information between the teacher and student. Contributions: The main contributions of this paper can be summarized as follows:

  1. (1)

    This paper proposes a Variational AdaBoost Knowledge Distillation framework, VAdaKD, to address the limitations of conventional knowledge distillation methods, where the student passively learns the teacher's knowledge. VAdaKD offers a more active paradigm for knowledge distillation, allowing the student to determine the “granularity” in mining the teacher's knowledge within this framework.

  2. (2)

    VAdaKD employs a two-step strategy to improve the efficiency of AdaBoost-based knowledge distillation for categorizing dermatology images. Initially, the student is empowered to mine the teacher's learning representation through AdaBoost actively. Subsequently, a variational difficulty mining strategy (VDMS) is introduced to reduce the influence of noise from the teacher by maximizing the mutual information shared between the teacher and student.

  3. (3)

    Finally, we formulate the weighted distillation loss with sample-level weights to effectively incorporate three types of knowledge. Our research involves extensive experiments on three well-known dermatological datasets, namely the Dermnet ISIC2019 and HAM10000, respectively. The results of our experiments clearly show the efficacy of our proposed VAdaKD method. Additionally, the visualization results confirm that VAdaKD excels in identifying learning challenges and reducing interference from teacher noise at various stages, consequently enhancing the classification accuracy of dermatology images.

In contrast to previous work on knowledge distillation, our proposed VAdaKD introduces a novel paradigm for actively mining the teacher's knowledge, leading to a more comprehensive perception of the knowledge in the teacher while minimizing the presence of unnecessary information. The highlights of this paper are outlined below:

  1. (1)

    Propose a distillation framework, VAdaKD, for actively mining the teacher's learning representation for skin lesion classification.

  2. (2)

    Design a GCN-based difficulty mining to perceive the more nuanced classification difficulties.

  3. (3)

    Construct the weighted distillation loss with sample-level weights to effectively engage three forms of knowledge.

Section "Introduction" of this paper outlines the limitations of traditional knowledge distillation methods and proposes potential solutions. Section "Related work" reviews relevant literature, while Sect. "Methodology" introduces our proposed method. Experimental results and visualizations on three datasets are presented in Sect. "Experiments" to demonstrate the efficacy of our method. The results of the research are discussed in Sect. "Conclusion".

Related work

Knowledge distillation

Knowledge distillation, renowned for its ability to condense models, enables knowledge transfer from a cumbersome teacher model to a compact student model. This allows for efficiently deploying the developed skin lesion classification and diagnostic model on lightweight mobile devices. Hinton et al. [7] proposed to enable a student to learn the hidden knowledge of the teacher by reducing the KL divergence of the last layer's soft output with temperature. Zhao et al. [8] developed a framework called Decoupled Knowledge Distillation (DKD) that separates the information of target and non-target classes present in logits. They concluded that the effectiveness of logits-based knowledge distillation is due to the information provided by the non-target classes. Hossain et al. [52,53,54] to deploy lightweight dermatologic diagnostic models on mobile devices will aid in early screening and improve rural diagnosis and treatment.

Methodology

Framework design

The framework of the Variational AdaBoost Knowledge Distillation, VAdaKD proposed in this paper is illustrated in Fig. 1. The figure illustrates the evolution of sample weights throughout the training process and demonstrates how the student model leverages these weights to mine and learn knowledge from the teacher model actively. Both the teacher and student models are assumed to have \(L\) stages. We consider the stages in the backbone as an ordered learning process. We incorporate intermediate auxiliary classifiers for each stage, resulting in a total of \(L\) base classifiers. The base classifier for the \(l\) th stage is denoted as \(p_{l} \left( \cdot \right)\). The function \(p_{l} \left( \cdot \right)\) comprises a convolutional layer, a global average pooling layer, and a fully-connected layer. AdaBoost learns base classifiers sequentially, and the weights of the input dermatology samples for the \(l\) th base classifier are denoted as \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\), where \(B\) represents the number of samples in the input batch. The base classifier weight can be calculated as \(\alpha_{l}\). The weights of the input dermatology samples at each stage measure the learning difficulty in skin lesion classification. These sample-level weights, represented by \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\), are utilized in the distillation module to identify which dermatology samples the student finds difficult to learn at that stage. The student can actively mine knowledge from the teacher regarding the learning difficulties in skin lesion classification. However, as the student's learning process progresses, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher. We introduce a variational difficulty mining strategy (VDMS) that aims to minimize the impact of noise by maximizing the mutual information between the teacher and student. The final distillation loss is then obtained by linearly weighting the individual base classifiers using the base classifier weights \(\alpha_{l}\). The losses in the VAdaKD encompass the cross-entropy loss for the student task, the AdaBoost training loss for the student task, and the Variational AdaBoost-based distillation loss, respectively.

Fig. 1
figure 1

The proposed framework, AdaBoost Knowledge Distillation, VAdaKD in this paper. VAdaKD empowers the student to actively mine the teacher's learning representation in skin lesion classification using AdaBoost. The weights of the dermatology input samples, denoted as \(\left\{ {w_{l}^{i} } \right\}_{i = 1}^{B}\) for each stage, represent the learning difficulty for each sample \(x_{i}\) at the l-th stage. Based on these weights, the three forms of knowledge are selectively transferred to the student. Among them, \(F_{l}^{S}\) and \(F_{l}^{T}\) represent the intermediate features of the student and teacher at the l-th stage. \(f_{i}^{s}\) and \(f_{i}^{t}\) represent the outputs of the \(i\)-\({\text{th}}\) dermatology sample for the student and teacher, respectively. \(C\) represents the correlation between two specific dermatology samples. Finally, the final distillation loss is obtained by linearly weighting each form of knowledge according to the base classifier weights \(\alpha_{l}\). We introduce a VDMS that eliminates noise interference from nuanced difficulties by maximizing the mutual information between the teacher and student

The distillation loss comprises three types of knowledge: logits-based knowledge, intermediate feature-based knowledge, and relationship-based knowledge, respectively. However, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher. VDMS first incorporates the GCN to form a nearest-neighbor relationship matrix \(A\). This matrix is then used to calculate the information of the current node's \(l\) th hop and repeat it to the student. Next, VDMS eliminates noise interference from these nuanced difficulties by maximizing the mutual information between the teacher and student. The general framework of AdaKD, as proposed in this paper, is depicted in Fig. 1.

Pre-training Teacher Model

The teacher model for skin lesion classification consists of a backbone network, denoted as \(f^{t} \left( \cdot \right)\), with \(L\) stages. The auxiliary (i.e., base) classifiers, represented by \(\left\{ {p_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}\), are contained within the backbone. The pre-training process of the teacher model is divided into two phases. In the first phase, the teacher model with \(L\) stages is trained, resulting in the trained backbone \(f^{t} \left( \cdot \right)\). In the second phase, the weights of the backbone \(f^{t} \left( \cdot \right)\) are frozen, and the parameters of the auxiliary classifiers \(\left\{ {p_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}\) are updated. According to the AdaBoost boosting theory, the weights of the input dermatology samples for the \(l\) th base classifier are represented as \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\), where B represents the number of samples in the input batch. Then, based on the error rate of each base classifier, we calculate the weight of the base classifier \(\alpha_{l}\). Finally, we obtain the AdaBoost loss by linearly weighting all base classifiers by weight \(\left\{ {\alpha_{l} } \right\}_{l = 1}^{L}\). Note that we use the cross-entropy loss in both training phases to update the parameters for true-label supervision. It is worth noting that, at the \({ }l\) th stage, the error rate of the auxiliary classifier is represented by Eq. (1),

$$ err_{l} = \mathop \sum \limits_{i = 1}^{B} w_{l - 1}^{i} {\mathbb{I}}\left( {y_{i} \ne p_{l}^{t} \left( {x_{i} } \right)} \right)/\mathop \sum \limits_{i = 1}^{B} w_{l - 1}^{i} , $$
(1)

where \(y_{i}\) represents the true label (i.e., Ground Truth) of the dermatology input samples \(x_{i}\), and the batch size is denoted as \(B\). The weight \(\alpha_{l}\) of the auxiliary classifier for the \(l\)-\({\text{th}}\) stage is shown in Eq. (2),

$$ \alpha_{l} = \log \frac{{1 - err_{l} }}{{err_{l} }} + \log \left( {K - 1} \right), $$
(2)

where \(K\) represents the number of categories. In order to ensure that \(\alpha_{l}\) is positive, the condition \(\left( {1 - err_{l} } \right) > 1/K\) needs to be satisfied. We then proceed to update the weights of the input samples \(x_{i}\), giving a higher weight to the misclassified sample. Finally, \(w_{l}^{i}\) represents the weight of the dermatology input sample for the \(l\) + 1th base classifier, as shown in Eq. (3). This process is iterative, and the initial weight of the sample \(x_{i}\) is set to \(w_{0}^{i} = 1/B\).

$$ w_{l}^{i} \leftarrow w_{l - 1}^{i} \cdot \exp \left( {\alpha_{l} \cdot {\mathbb{I}}\left( {y_{i} \ne p_{l}^{t} \left( {x_{i} } \right)} \right)} \right). $$
(3)

As depicted in Fig. 2, the initial training phase involves using the true labels of skin lesion classification to supervise training the teacher model's backbone. In the subsequent training phase, the true labels are still used for supervision, but the weights of the backbone are frozen. AdaBoost is also employed to train the auxiliary classifiers introduced at all stages. Finally, the outputs of each classifier are integrated into a final prediction utilizing the corresponding weights.

Fig. 2
figure 2

Two-phase pre-training of a teacher model with \(L\) stages for skin lesion classification. In the first phase, the backbone \(f^{t} \left( \cdot \right)\) is trained. In the second phase, the weights of the backbone \(f^{t} \left( \cdot \right)\) are frozen, and the parameters of the auxiliary classifiers \(\left\{ {p_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}\) are updated using the AdaBoost boosting theory, where the input sample weights of the \(l\) th base classifier are denoted as \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\). Then the weight \(\alpha_{l}\) of the base classifier is calculated based on the error rate. Again, the dermatology input sample weights \(\left\{ {w_{l}^{i} } \right\}_{i = 1}^{B}\) of the \(l\) + 1th auxiliary classifier are updated and adjusted by the \(l\) th auxiliary classifier

Training student model

We refer to the backbone of the student model for skin lesion classification as \(f^{s} \left( \cdot \right)\). Additionally, we have auxiliary classifiers within this backbone, denoted as \(\left\{ {p_{l}^{S} \left( \cdot \right)} \right\}_{l = 1}^{L}\), where \(p_{l}^{s} \left( \cdot \right)\) represents the auxiliary classifier of the \(l\) th stage in the student. The student is trained under the guidance of the pre-trained teacher. The overall loss function includes the task loss of true-label supervision, the AdaBoost loss of true-label supervision, and the weighted distillation loss with sample-level weights between the student and teacher.


Task loss of true-label supervision in skin lesion classification \( {\varvec{L}}_{{{\varvec{ce}}}}\): The task loss in the student is determined by the cross-entropy loss of true-label supervision. The objective is to enable \(f^{s} \left( \cdot \right)\) to accurately classify one-hot labeled data. The task loss \(L_{ce}\) is obtained by computing the cross-entropy of the final outputs against the true labels.


AdaBoost loss of true-label supervision in skin lesion classification \({\varvec{L}}_{{{\varvec{ada}}\_{\varvec{ce}}}}\): AdaBoost employs a sequential training approach and this involves training the student model using \(L\) auxiliary classifiers with prediction functions \(\left\{ {p_{l}^{s} \left( \cdot \right)} \right\}_{l = 1}^{L}\) The input samples' weights of the \(l\) th base classifier are denoted as \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\). Similarly, the input samples' weights \(\left\{ {w_{l}^{i} } \right\}_{i = 1}^{B}\) of the \(l\) + 1th base classifier are updated and adjusted by the \(l\) th base classifier. The weight of the \(l\) th base classifier \(\alpha_{l}\) is then computed based on its error rate. Finally, the AdaBoost loss is obtained by linearly combining the individual base classifiers by weights \(\left\{ {\alpha_{l} } \right\}_{l = 1}^{L}\). In addition, as the learning process of skin lesion classification progresses, it becomes increasingly challenging for the subsequent learning stages to mine nuanced classifying difficulties. To address the problem, the information from the current node's \(l\) th hop in the teacher is repeated to the student. Specifically, the correlation matrix \(\left\{ {{\text{\rm A}}_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L} \in R^{B \times B}\) of each stage in the teacher model is copied to the student model. Here, \(B\) represents the batch size, and the features of the \(l\) th stage in the student are transformed into \(F_{l}^{S} = {\text{\rm A}}_{l}^{t} F_{l}^{S} \in R^{B \times C \times H \times W}\). The final AdaBoost loss of true-label supervision is shown in Eq. (4),

$$ L_{ada\_ce} = - \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} y_{i} \log \alpha_{l} p_{l}^{s} \left( {x_{i} } \right) , $$
(4)

where \(y_{i}\) represents the true label of the dermatology input sample \(x_{i}\),\(p_{l}^{s} \left( {x_{i} } \right)\) is formulated as \(p_{l}^{s} \left( {x_{i} } \right) = C_{l} \left( {w_{l - 1}^{i} \cdot F_{l}^{S} \left( {x_{i} } \right)} \right)\), the intermediate features of the input sample \(x_{i}\) is extracted by \(F_{l}^{S} \left( {x_{i} } \right)\) and is classified by the \(l\) th auxiliary classifier module \(C_{l} \left( \cdot \right)\) with the weight \(w_{l - 1}^{i}\). The value of \(w_{l - 1}^{i}\) represents the difficulty of classification learning for \(x_{i}\) in the \(l\) th base classifier, with a larger value indicating higher difficulty.


Weighted distillation loss with sample-level weights \(L_{mimic}\): The weights of the input samples obtained by AdaBoost are represented as \(\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}\). We can effectively engage three forms of knowledge by applying the sample-level weights to the distillation module. The distillation loss encompasses logits-based, intermediate feature-based, and relationship-based distillation loss. The logits-based distillation loss consists of two forms, the first being the \(KL\) loss of the soft outputs with temperature for the last layer of the student and teacher (as shown in Eq. (5)),

$$ L_{kd} = - \tau^{2} \mathop \sum \limits_{i = 1}^{B} p^{t} \left( {x_{i} ;\tau } \right)\log \left( {p^{s} \left( {x_{i} ;\tau } \right)} \right), $$
(5)

where \(\tau\) represents the temperature, and we set it to 3 in this paper, and \(B\) denotes the size of the input batch. The second form of logits-based distillation loss is the distillation loss of the soft outputs with temperature for the last layer of the student–teacher correspondence auxiliary classifiers, as shown in Eq. (6),

$$ L_{ada\_kd} = - \tau^{2} \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot p_{l}^{t} \left( {x_{i} ;\tau } \right)\log \left( {p_{l}^{s} \left( {x_{i} ;\tau } \right)} \right). $$
(6)

In order to effectively extract the representation information from the intermediate features \(\left\{ {F_{l}^{T} } \right\}_{l = 1}^{L}\), this paper constructs the distillation loss using Attention Transfer [10] as shown in Eq. (7),

$$ L_{att} = \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot L_{AT} \left( {{\text{F}}_{l}^{T} \left( {x_{i} } \right),{\text{ F}}_{l}^{S} \left( {x_{i} } \right)} \right), $$
(7)

however, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher as the student's learning process progresses. Therefore, we introduce the VDMS to minimize the impact of noises, and the AT-based distillation loss with VDMS is illustrated in Eq. (8),

$$L_{att\_vdms} = \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot \left( {\frac{{\left( {t_{l}^{T} \left( {x_{i} } \right) - \mu_{l}^{S} \left( {x_{i} } \right)} \right)^{2} }}{{2\sigma_{l}^{2} \left( {x_{i} } \right)}} + \log \sigma_{l}^{2} \left( {x_{i} } \right)} \right), $$
(8)

where \(t_{l}^{T} \left( \cdot \right)\) represents the teacher’s attention map after imposing attention map** to intermediate feature \({\text{F}}_{l}^{T} \left( \cdot \right)\) as implemented in Attention Transfer [\({\text{F}}_{l}^{S} \left( \cdot \right)\), and \(\sigma_{l}^{2} \left( \cdot \right)\) represents the variance of the attention map.

The above two kinds of distillation losses focus solely on individual-instance knowledge distillation. However, to explore the relationship between different individuals in classifying dermatology images, we present a relationship-based distillation loss. Firstly, we extract the logits features of \(B\) dermatology samples from each auxiliary classifier of the teacher model and construct the correlation matrix \(\left\{ {R_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}\). Similarly, we construct the correlation matrix \(\left\{ {R_{l}^{s} \left( \cdot \right)} \right\}_{l = 1}^{L}\) for the logits features of \(B\) dermatology samples from each auxiliary classifier of the student model. Then, we calculate the Mean Squared Error (MSE) loss between the correlation matrices of the student and teacher to obtain the relationship-based distillation loss, as shown in Eq. (9),

$$ L_{rkd} = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot L_{mse} \left( {R_{l}^{t} ,{ }R_{l}^{s} } \right) = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot \left( {R_{l}^{t} - R_{l}^{s} } \right)^{2} , $$
(9)

the relationship-based distillation loss with VDMS is expressed in Eq. (10), where \(\sigma_{l}^{2}\) represents the variance of the correlation matrix,

$$ L_{rkd\_vdms} = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot \left( {\frac{{\left( {R_{l}^{t} - R_{l}^{s} } \right)^{2} }}{{2\sigma_{l}^{2} }} + \log \sigma_{l}^{2} } \right). $$
(10)

Total Loss: The total loss of the student model includes the task loss for true-label supervision, the AdaBoost loss for true-label supervision, and the weighted distillation loss with sample-level weights between the student and the teacher, as shown in Eq. (11),

$$ L_{ total} = L_{ce} + \lambda_{1} L_{ada\_ce} + \lambda_{2} L_{mimic} , $$
(11)

where the weighted distillation loss with sample-level weights in skin lesion classification \(L_{mimic}\) is represented by Eq. (12), where \(\lambda_{1}\), \(\lambda_{2}\),\(\eta\), \(\beta\), and \(\gamma\) are balance factors. In the distillation loss, we set the temperature hyperparameter \(\tau\) to 3.

$$\begin{aligned} L_{mimic} = L_{kd} + \eta L_{ada\_kd} + \gamma L_{att\_vdms} { } + \beta L_{rkd\_vdms}.\end{aligned}$$
(12)

Experiments

The experimental setup of this paper consists of three parts. Firstly, in Sect. "Comparison Experiment", we compare with related knowledge distillation frameworks to validate the effectiveness of our proposed VAdaKD. It should be noted that AdaKD represents our proposed AdaBoost-based KD with GCN, and VAdaKD represents our proposed AdaBoost-based KD with VDMS. Secondly, in Sect. "Ablation Study", the ablation experiments are designed to observe the impact of the number of auxiliary classifiers and to explore the validity of each component in the distillation loss. Lastly, in subSect. "Visualization", we explore the performance of VAdaKD in mining the learning difficulties at each stage by visualizing the correlation matrix for each stage of the student model and validate the multi-classification performance achieved by VAdaKD through t-distributed stochastic neighbor embedding (t-SNE) visualization.

Our experiments assess the model's performance using four evaluation indicators: Accuracy (Acc), Recall, F1-score, and AUC. Accuracy is a widely used metric in classification tasks that measures the overall classification performance of the model. However, achieving a high accuracy in highly imbalanced data may not be meaningful. AUC helps to make up for this shortcoming and better reflects the classifier's performance. Recall is crucial in the medical field as it indicates the correct classification rate among all diseased samples, essential for timely patient treatment. F1-score is a comprehensive metric that takes into account both Recall and Precision.

Comparison experiment

Extensive experiments are conducted on three benchmark datasets, including the Dermnet dataset, ISIC 2019 dataset, and HAM10000 dataset. The values of \(\lambda_{1}\) and \(\lambda_{2}\) in Eq. (11) are set to 0.5 and 0.3, respectively. In Eq. (12), the value of \(\eta\) is set to 0.25. Furthermore, based on the experimental setup of AT [

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

  1. Wang Y, Wang Y, Cai J, Lee TK, Miao C, Wang ZJ (2023) Ssd-kd: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images. Med Image Anal 84:102693

    Article  Google Scholar 

  2. Khan MS, Alam KN, Dhruba AR, Zunair H, Mohammed N (2022) Knowledge distillation approach towards melanoma detection. Comput Biol Med 146:105581

    Article  Google Scholar 

  3. Elbatel M, Martí R, Li X (2024) FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition. IEEE Trans Med Imaging 43(3):954–965

    Article  Google Scholar 

  4. Adepu AK, Sahayam S, Jayaraman U, Arramraju R (2023) Melanoma classification from dermatoscopy images using knowledge distillation for highly imbalanced data. Comput Biol Med 154:106571

    Article  Google Scholar 

  5. Liu Q, Yu L, Luo L, Dou Q, Heng PA (2020) Semi-supervised medical image classification with relation-driven self-ensembling model. IEEE Trans Med Imaging 39(11):3429–3440

    Article  Google Scholar 

  6. Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Computer Vis 129(6):1789–1819

    Article  Google Scholar 

  7. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39

    Google Scholar 

  8. Zhao B, Cui Q, Song R, Qiu Y, Liang J (2022) Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 11953–11962

  9. Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. Proc Int Conf Learn Represent 2(3):1

    Google Scholar 

  10. Komodakis N, Sergey Z (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1612.03928

  11. Kim J, Park S, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Adv Neural Inform Process Syst. https://doi.org/10.48550/ar**v.1802.04977

  12. Yang C, An Z, Cai L, Xu Y (2021) Hierarchical self-supervised augmented knowledge distillation. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/168

  13. ** X, Peng B, Wu Y, Liu Y, Liu J, Liang D, Yan J, Hu X (2019) Knowledge distillation via route constrained optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1345–1354

  14. Passalis N, Anastasios T (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision, pp. 268–284

  15. Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3967–3976

  16. Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1365–1374

  17. Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4133–4141

  18. Lee SH, Kim DH, Song BC (2018) Self-supervised knowledge distillation using singular value decomposition. In: Proceedings of the European conference on computer vision, pp. 335–350

  19. Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7096–7104

  20. Passalis N, Tzelepi M, Tefas A (2020) Heterogeneous knowledge distillation using information flow modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2339–2348

  21. Peng B, ** X, Liu J, Li D, Wu Y, Liu Y, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016

  22. Zhang R, Yu Y, Shen J, Cui X, Zhang C (2023) Local boosting for weakly-supervised learning. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3364–3375

  23. Zhang R, Yu Y, Shetty P, Song L, Zhang C (2022) Prboost: prompt-based rule discovery and boosting for interactive weakly-supervised learning. https://doi.org/10.18653/v1/2022.acl-long.55. ar**v preprint ar**v:2203.09735

  24. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Int Conf Mach Learn 96:148–156

    Google Scholar 

  25. Baig MM, Awais MM, El-Alfy ESM (2017) AdaBoost-based artificial neural network learning. Neurocomputing 248:120–126

    Article  Google Scholar 

  26. Gao Y, Rong W, Shen Y, **ong Z (2016) Convolutional neural network based sentiment analysis using Adaboost combination. In: International Joint Conference on Neural Networks, pp. 1333–1338

  27. Taherkhani A, Cosma G, McGinnity TM (2020) AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366

    Article  Google Scholar 

  28. Yang S, Chen LF, Yan T, Zhao YH, Fan YJ (2017) An ensemble classification algorithm for convolutional neural network based on AdaBoost. In: International Conference on Computer and Information Science, pp. 401–406

  29. Shakeel PM, Tolba A, Al-Makhadmeh Z, Jaber MM (2020) Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Comput Appl 32(3):777–790

    Article  Google Scholar 

  30. Sun K, Zhu Z, Lin Z (2019) Adagcn: Adaboosting graph convolutional networks into deep models. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1908.05081

  31. Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1609.02907

  32. Zhou H, Song L, Chen J, Zhou Y, Wang G, Yuan J, Zhang Q (2021) Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective. Proc Int Conf Learn Represent. https://doi.org/10.48550/ar**v.2102.00650

  33. Li Z, Li X, Yang L, Zhao B, Song R, Luo L, Yang J (2023) Curriculum temperature for knowledge distillation. Proc AAAI Conf Artif Intell 37(2):1504–1512

    Google Scholar 

  34. Huang F, Ash J, Langford J, Schapire R (2018) Learning deep resnet blocks sequentially using boosting theory. In: International Conference on Machine Learning, pp. 2058–2067

  35. Guo Z, Yan H, Li H, Lin X (2023) Class attention transfer based knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11868–11877

  36. Oakley, A. DermNet New Zealand. Topical formulations. Updated February.

  37. Tschandl P, Rosendahl C, Kittler H (2018) The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 5(1):1–9

    Article  Google Scholar 

  38. Codella NC, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza SW, Halpern A (2018) Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: International Symposium on Biomedical Imaging, 168–172

  39. Combalia M, Codella NC, Rotemberg V, Helba B, Vilaplana V, Reiter O, Malvehy J (2019) Bcn20000: dermoscopic lesions in the wild. https://doi.org/10.48550/ar**v.1908.02288.ar**v preprint ar**v:1908.02288

  40. Hossain MI, Elahi MM, Ramasinghe S, Cheraghian A, Rahman F, Mohammed N, Rahman S (2023) LumiNet: the bright side of perceptual knowledge distillation.https://doi.org/10.48550/ar**v.2310.03669. ar**v preprint ar**v:2310.03669

  41. Huang T, You S, Wang F, Qian C, Xu C (2022) Knowledge distillation from a stronger teacher. Adv Neural Inf Process Syst 35(33716):33727

    Google Scholar 

  42. Shu C, Liu Y, Gao J, Yan Z, Shen C (2021) Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320

  43. Haberman HF, Norwich KH, Diehl DL, Evans SJ, Harvey B, Landau J, Zingg W (1985) DIAG: a computer-assisted dermatologic diagnostic system—clinical experience and insight. J Am Acad Dermatol 12(1):132–143

    Article  Google Scholar 

  44. Brooks GJ, Ashton RE, Pethybridge RJ (1992) DERMIS: a computer system for assisting primary-care physicians with dermatological diagnosis. Br J Dermatol 127(6):614–619

    Article  Google Scholar 

  45. Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, Coz D (2020) A deep learning system for differential diagnosis of skin diseases. Nat Med 26(6):900–908

    Article  Google Scholar 

  46. Hameed SA, Haddad A, Nirabi A (2019) Dermatological diagnosis by mobile application. Bull Elect Eng Inform 8(3):847–854

    Article  Google Scholar 

  47. Joshi SK (2023) Chaos embedded opposition based learning for gravitational search algorithm. Appl Intell 53(5):5567–5586

    Google Scholar 

  48. Wang Y, Yu Y, Gao S, Pan H, Yang G (2019) A hierarchical gravitational search algorithm with an effective gravitational constant. Swarm Evolut Comput 46:118–139

    Article  Google Scholar 

  49. Wang Y, Gao S, Yu Y, Cai Z, Wang Z (2021) A gravitational search algorithm with hierarchy and distributed framework. Knowl Based Syst 218:106877

    Article  Google Scholar 

  50. Mohammadi A, Sheikholeslam F, Mirjalili S (2023) Nature-inspired metaheuristic search algorithms for optimizing benchmark problems: inclined planes system optimization to state-of-the-art methods. Arch Comput Methods Eng 30(1):331–389

    Article  Google Scholar 

  51. Bolotnik N, Figurina T (2023) Controllabilty of a two-body crawling system on an inclined plane. Meccanica 58(2):321–336

    Article  MathSciNet  Google Scholar 

  52. Song X, Song Y, Stojanovic V, Song S (2023) Improved dynamic event-triggered security control for T-S fuzzy LPV-PDE systems via pointwise measurements and point control. Int J Fuzzy Syst 25(8):3177–3192

    Article  Google Scholar 

  53. Zhang Z, Song X, Sun X, Stojanovic V (2023) Hybrid-driven-based fuzzy secure filtering for nonlinear parabolic partial differential equation systems with cyber attacks. Int J Adap Control Signal Process 37(2):380–398

    Article  MathSciNet  Google Scholar 

  54. Zhang X, He S, Stojanovic V, Luan X, Liu F (2021) Finite-time asynchronous dissipative filtering of conic-type nonlinear Markov jump systems. Sci China Inform Sci 64(5):152206

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We express our gratitude to the High-Performance Laboratory of the School of Information Engineering at Jiangxi University of Science and Technology for providing the computational resources. We would also like to thank all the teachers and students in this academic group for their valuable suggestions. Additionally, we appreciate the contributions of other researchers who helped adjust the ideas, designs, and experiments in this paper.

Funding

This research is supported in part by the Jiangxi Provincial Natural Science Foundation under grants 20224BAB212013, 20224BAB212008, and 20224BAB202002, in part by the National Natural Science Foundation of China under grants 62266020 and 62261027, and in part by the Science and Technology Research Project of Jiangxi Provincial Department of Education under grants GJJ2200830 and GJJ190467.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianqing Wu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, X., **ong, G., Wu, J. et al. Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01501-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40747-024-01501-4

Keywords