Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images

Yu, **angchun; **ong, Guoliang; Wu, Jianqing; Zheng, Jian; Liang, Miaomiao; Qiu, Liu**; Yu, Lingjuan; Xu, Qing

doi:10.1007/s40747-024-01501-4

Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images

Original Article
Open access
Published: 22 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images

Download PDF

**angchun Yu ORCID: orcid.org/0000-0001-6206-450X^1,2,
Guoliang **ong^1,2,
Jianqing Wu^1,2,
Jian Zheng^1,2,
Miaomiao Liang^1,2,
Liu** Qiu³,
Lingjuan Yu^1,2 &
…
Qing Xu^1,2

97 Accesses
Explore all metrics

Abstract

Knowledge Distillation has shown promising results for classifying skin lesions in dermatology images. Traditional knowledge distillation typically involves the student model passively mimicking the teacher model's knowledge. We propose utilizing AdaBoost to enable the student to actively mine the teacher's learning representation for skin lesion classification. This paradigm allows the student to determine the “granularity” in mining the teacher's knowledge. As the student's learning process progresses, it can become challenging to pinpoint specific learning difficulties, especially with potential interference from the teacher. To address this issue, we introduce a variational difficulty mining strategy to reduce the impact of such interference. This strategy involves the distillation module capturing more nuanced classification difficulties by extracting information from the node's $l$ th hops. By maximizing the mutual information between the teacher and student, we effectively filter out noise interference from these nuanced difficulties. Our proposed framework, Variational AdaBoost Knowledge Distillation (VAdaKD), allows the student to actively mine and leverage the teacher's knowledge for improved skin lesion classification. Our proposed method performs satisfactorily on three benchmark datasets: the Dermnet dataset, ISIC 2019 dataset, and HAM10000 dataset, respectively. Specifically, our method shows an improvement of 2–3% over the baseline on the Dermnet dataset and outperforms the best results of the other compared methods by 1%. Experimental results and visualization performance indicate that our proposed method effectively captures the learning difficulties and achieves better visualized t-distributed stochastic neighbor embedding classification results. Our code is available at https://github.com/25brilliant/VAdaKD.

Masked autoencoders with generalizable self-distillation for skin lesion segmentation

Article 24 April 2024

EPVT: Environment-Aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Knowledge distillation has shown promising results for classifying skin lesions in dermatology images [1,2,3,4,5]. Knowledge distillation encompasses three types of knowledge: logits-based knowledge, intermediate feature-based knowledge, and relationship-based knowledge [6]. Logits-based knowledge distillation [7, 8] enables the student to mimic the teacher's soft output by adjusting the temperature parameter. Knowledge transfer through logits is accomplished by minimizing the Kullback–Leibler (KL) divergence between the student’s and teacher's final outputs. Intermediate feature-based knowledge distillation [9,32] proposed a weighted soft-label distillation framework, WSLD. This framework assigns a dynamic weight to the distillation loss to determine the extent to which the teacher's soft-label information can be utilized based on the cross-entropy loss of both the student and the teacher. However, a student's learning is a sequential process and will encounter varying difficulties at different stages. The WSLD framework only mines logits-based knowledge, neglecting the representational knowledge inherent in each stage and failing to utilize intermediate feature-based and relationship-based knowledge. Meanwhile, this method cannot avoid noise interference from the teacher. Innovations: This research introduces AdaBoost and a variational difficulty mining strategy (VDMS) to knowledge distillation and proposes a distillation framework called Variational AdaBoost Knowledge Distillation (VAdaKD). The framework aims to help the student determine the “granularity” of mining the teacher's knowledge by considering the learning difficulties of skin lesion classification in dermatology images. Specifically, we apply AdaBoost to treat all stages of the student model as a sequential learning process. We also introduce an intermediate auxiliary classifier for each stage, where the weights of input samples from each stage correspond to the degree of learning difficulty. The paper proposes actively leveraging the teacher's knowledge based on the student's learning difficulty at each stage, facilitating targeted knowledge transfer. Finally, the paper performs linear weighting based on the weights of the base classifiers to obtain the final distillation loss. In this research, we incorporate three forms of knowledge into the distillation loss: logits-based, intermediate feature-based, and relationship-based, respectively. However, as the student's learning process progresses, it becomes increasingly challenging to identify the degree of learning difficulties in subsequent learning stages due to noise interference from the teacher. Therefore, we first adopt the idea of GCN to construct the nearest-neighbor relationship matrix $A$. This matrix helps us calculate the information of the current node's $l$ th hop and then recount it to the student. The goal is to enable the student to perceive the nuanced classification difficulties by leveraging the multi-hop information among dermatology samples while maintaining the same nearest-neighbor relationship with the teacher. Next, we eliminate noise interference from these nuanced difficulties by maximizing the mutual information between the teacher and student. Contributions: The main contributions of this paper can be summarized as follows:

(1)
This paper proposes a Variational AdaBoost Knowledge Distillation framework, VAdaKD, to address the limitations of conventional knowledge distillation methods, where the student passively learns the teacher's knowledge. VAdaKD offers a more active paradigm for knowledge distillation, allowing the student to determine the “granularity” in mining the teacher's knowledge within this framework.
(2)
VAdaKD employs a two-step strategy to improve the efficiency of AdaBoost-based knowledge distillation for categorizing dermatology images. Initially, the student is empowered to mine the teacher's learning representation through AdaBoost actively. Subsequently, a variational difficulty mining strategy (VDMS) is introduced to reduce the influence of noise from the teacher by maximizing the mutual information shared between the teacher and student.
(3)
Finally, we formulate the weighted distillation loss with sample-level weights to effectively incorporate three types of knowledge. Our research involves extensive experiments on three well-known dermatological datasets, namely the Dermnet ISIC2019 and HAM10000, respectively. The results of our experiments clearly show the efficacy of our proposed VAdaKD method. Additionally, the visualization results confirm that VAdaKD excels in identifying learning challenges and reducing interference from teacher noise at various stages, consequently enhancing the classification accuracy of dermatology images.

In contrast to previous work on knowledge distillation, our proposed VAdaKD introduces a novel paradigm for actively mining the teacher's knowledge, leading to a more comprehensive perception of the knowledge in the teacher while minimizing the presence of unnecessary information. The highlights of this paper are outlined below:

(1)
Propose a distillation framework, VAdaKD, for actively mining the teacher's learning representation for skin lesion classification.
(2)
Design a GCN-based difficulty mining to perceive the more nuanced classification difficulties.
(3)
Construct the weighted distillation loss with sample-level weights to effectively engage three forms of knowledge.

Section "Introduction" of this paper outlines the limitations of traditional knowledge distillation methods and proposes potential solutions. Section "Related work" reviews relevant literature, while Sect. "Methodology" introduces our proposed method. Experimental results and visualizations on three datasets are presented in Sect. "Experiments" to demonstrate the efficacy of our method. The results of the research are discussed in Sect. "Conclusion".

Related work

Knowledge distillation

Knowledge distillation, renowned for its ability to condense models, enables knowledge transfer from a cumbersome teacher model to a compact student model. This allows for efficiently deploying the developed skin lesion classification and diagnostic model on lightweight mobile devices. Hinton et al. [7] proposed to enable a student to learn the hidden knowledge of the teacher by reducing the KL divergence of the last layer's soft output with temperature. Zhao et al. [8] developed a framework called Decoupled Knowledge Distillation (DKD) that separates the information of target and non-target classes present in logits. They concluded that the effectiveness of logits-based knowledge distillation is due to the information provided by the non-target classes. Hossain et al. [52,53,54] to deploy lightweight dermatologic diagnostic models on mobile devices will aid in early screening and improve rural diagnosis and treatment.

Methodology

Framework design

The framework of the Variational AdaBoost Knowledge Distillation, VAdaKD proposed in this paper is illustrated in Fig. 1. The figure illustrates the evolution of sample weights throughout the training process and demonstrates how the student model leverages these weights to mine and learn knowledge from the teacher model actively. Both the teacher and student models are assumed to have $L$ stages. We consider the stages in the backbone as an ordered learning process. We incorporate intermediate auxiliary classifiers for each stage, resulting in a total of $L$ base classifiers. The base classifier for the $l$ th stage is denoted as $p_{l} \left( \cdot \right)$. The function $p_{l} \left( \cdot \right)$ comprises a convolutional layer, a global average pooling layer, and a fully-connected layer. AdaBoost learns base classifiers sequentially, and the weights of the input dermatology samples for the $l$ th base classifier are denoted as $\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}$, where $B$ represents the number of samples in the input batch. The base classifier weight can be calculated as $\alpha_{l}$. The weights of the input dermatology samples at each stage measure the learning difficulty in skin lesion classification. These sample-level weights, represented by $\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}$, are utilized in the distillation module to identify which dermatology samples the student finds difficult to learn at that stage. The student can actively mine knowledge from the teacher regarding the learning difficulties in skin lesion classification. However, as the student's learning process progresses, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher. We introduce a variational difficulty mining strategy (VDMS) that aims to minimize the impact of noise by maximizing the mutual information between the teacher and student. The final distillation loss is then obtained by linearly weighting the individual base classifiers using the base classifier weights $\alpha_{l}$. The losses in the VAdaKD encompass the cross-entropy loss for the student task, the AdaBoost training loss for the student task, and the Variational AdaBoost-based distillation loss, respectively.

The distillation loss comprises three types of knowledge: logits-based knowledge, intermediate feature-based knowledge, and relationship-based knowledge, respectively. However, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher. VDMS first incorporates the GCN to form a nearest-neighbor relationship matrix $A$. This matrix is then used to calculate the information of the current node's $l$ th hop and repeat it to the student. Next, VDMS eliminates noise interference from these nuanced difficulties by maximizing the mutual information between the teacher and student. The general framework of AdaKD, as proposed in this paper, is depicted in Fig. 1.

Pre-training Teacher Model

The teacher model for skin lesion classification consists of a backbone network, denoted as $f^{t} \left( \cdot \right)$, with $L$ stages. The auxiliary (i.e., base) classifiers, represented by $\left\{ {p_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}$, are contained within the backbone. The pre-training process of the teacher model is divided into two phases. In the first phase, the teacher model with $L$ stages is trained, resulting in the trained backbone $f^{t} \left( \cdot \right)$. In the second phase, the weights of the backbone $f^{t} \left( \cdot \right)$ are frozen, and the parameters of the auxiliary classifiers $\left\{ {p_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}$ are updated. According to the AdaBoost boosting theory, the weights of the input dermatology samples for the $l$ th base classifier are represented as $\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}$, where B represents the number of samples in the input batch. Then, based on the error rate of each base classifier, we calculate the weight of the base classifier $\alpha_{l}$. Finally, we obtain the AdaBoost loss by linearly weighting all base classifiers by weight $\left\{ {\alpha_{l} } \right\}_{l = 1}^{L}$. Note that we use the cross-entropy loss in both training phases to update the parameters for true-label supervision. It is worth noting that, at the ${ }l$ th stage, the error rate of the auxiliary classifier is represented by Eq. (1),

$$ err_{l} = \mathop \sum \limits_{i = 1}^{B} w_{l - 1}^{i} {\mathbb{I}}\left( {y_{i} \ne p_{l}^{t} \left( {x_{i} } \right)} \right)/\mathop \sum \limits_{i = 1}^{B} w_{l - 1}^{i} , $$

(1)

where $y_{i}$ represents the true label (i.e., Ground Truth) of the dermatology input samples $x_{i}$, and the batch size is denoted as $B$. The weight $\alpha_{l}$ of the auxiliary classifier for the $l$-${\text{th}}$ stage is shown in Eq. (2),

$$ \alpha_{l} = \log \frac{{1 - err_{l} }}{{err_{l} }} + \log \left( {K - 1} \right), $$

(2)

where $K$ represents the number of categories. In order to ensure that $\alpha_{l}$ is positive, the condition $\left( {1 - err_{l} } \right) > 1/K$ needs to be satisfied. We then proceed to update the weights of the input samples $x_{i}$, giving a higher weight to the misclassified sample. Finally, $w_{l}^{i}$ represents the weight of the dermatology input sample for the $l$ + 1th base classifier, as shown in Eq. (3). This process is iterative, and the initial weight of the sample $x_{i}$ is set to $w_{0}^{i} = 1/B$.

$$ w_{l}^{i} \leftarrow w_{l - 1}^{i} \cdot \exp \left( {\alpha_{l} \cdot {\mathbb{I}}\left( {y_{i} \ne p_{l}^{t} \left( {x_{i} } \right)} \right)} \right). $$

(3)

As depicted in Fig. 2, the initial training phase involves using the true labels of skin lesion classification to supervise training the teacher model's backbone. In the subsequent training phase, the true labels are still used for supervision, but the weights of the backbone are frozen. AdaBoost is also employed to train the auxiliary classifiers introduced at all stages. Finally, the outputs of each classifier are integrated into a final prediction utilizing the corresponding weights.

Training student model

We refer to the backbone of the student model for skin lesion classification as $f^{s} \left( \cdot \right)$. Additionally, we have auxiliary classifiers within this backbone, denoted as $\left\{ {p_{l}^{S} \left( \cdot \right)} \right\}_{l = 1}^{L}$, where $p_{l}^{s} \left( \cdot \right)$ represents the auxiliary classifier of the $l$ th stage in the student. The student is trained under the guidance of the pre-trained teacher. The overall loss function includes the task loss of true-label supervision, the AdaBoost loss of true-label supervision, and the weighted distillation loss with sample-level weights between the student and teacher.

Task loss of true-label supervision in skin lesion classification $ {\varvec{L}}_{{{\varvec{ce}}}}$: The task loss in the student is determined by the cross-entropy loss of true-label supervision. The objective is to enable $f^{s} \left( \cdot \right)$ to accurately classify one-hot labeled data. The task loss $L_{ce}$ is obtained by computing the cross-entropy of the final outputs against the true labels.

AdaBoost loss of true-label supervision in skin lesion classification ${\varvec{L}}_{{{\varvec{ada}}\_{\varvec{ce}}}}$: AdaBoost employs a sequential training approach and this involves training the student model using $L$ auxiliary classifiers with prediction functions $\left\{ {p_{l}^{s} \left( \cdot \right)} \right\}_{l = 1}^{L}$ The input samples' weights of the $l$ th base classifier are denoted as $\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}$. Similarly, the input samples' weights $\left\{ {w_{l}^{i} } \right\}_{i = 1}^{B}$ of the $l$ + 1th base classifier are updated and adjusted by the $l$ th base classifier. The weight of the $l$ th base classifier $\alpha_{l}$ is then computed based on its error rate. Finally, the AdaBoost loss is obtained by linearly combining the individual base classifiers by weights $\left\{ {\alpha_{l} } \right\}_{l = 1}^{L}$. In addition, as the learning process of skin lesion classification progresses, it becomes increasingly challenging for the subsequent learning stages to mine nuanced classifying difficulties. To address the problem, the information from the current node's $l$ th hop in the teacher is repeated to the student. Specifically, the correlation matrix $\left\{ {{\text{\rm A}}_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L} \in R^{B \times B}$ of each stage in the teacher model is copied to the student model. Here, $B$ represents the batch size, and the features of the $l$ th stage in the student are transformed into $F_{l}^{S} = {\text{\rm A}}_{l}^{t} F_{l}^{S} \in R^{B \times C \times H \times W}$. The final AdaBoost loss of true-label supervision is shown in Eq. (4),

$$ L_{ada\_ce} = - \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} y_{i} \log \alpha_{l} p_{l}^{s} \left( {x_{i} } \right) , $$

(4)

where $y_{i}$ represents the true label of the dermatology input sample $x_{i}$,$p_{l}^{s} \left( {x_{i} } \right)$ is formulated as $p_{l}^{s} \left( {x_{i} } \right) = C_{l} \left( {w_{l - 1}^{i} \cdot F_{l}^{S} \left( {x_{i} } \right)} \right)$, the intermediate features of the input sample $x_{i}$ is extracted by $F_{l}^{S} \left( {x_{i} } \right)$ and is classified by the $l$ th auxiliary classifier module $C_{l} \left( \cdot \right)$ with the weight $w_{l - 1}^{i}$. The value of $w_{l - 1}^{i}$ represents the difficulty of classification learning for $x_{i}$ in the $l$ th base classifier, with a larger value indicating higher difficulty.

Weighted distillation loss with sample-level weights $L_{mimic}$: The weights of the input samples obtained by AdaBoost are represented as $\left\{ {w_{l - 1}^{i} } \right\}_{i = 1}^{B}$. We can effectively engage three forms of knowledge by applying the sample-level weights to the distillation module. The distillation loss encompasses logits-based, intermediate feature-based, and relationship-based distillation loss. The logits-based distillation loss consists of two forms, the first being the $KL$ loss of the soft outputs with temperature for the last layer of the student and teacher (as shown in Eq. (5)),

$$ L_{kd} = - \tau^{2} \mathop \sum \limits_{i = 1}^{B} p^{t} \left( {x_{i} ;\tau } \right)\log \left( {p^{s} \left( {x_{i} ;\tau } \right)} \right), $$

(5)

where $\tau$ represents the temperature, and we set it to 3 in this paper, and $B$ denotes the size of the input batch. The second form of logits-based distillation loss is the distillation loss of the soft outputs with temperature for the last layer of the student–teacher correspondence auxiliary classifiers, as shown in Eq. (6),

$$ L_{ada\_kd} = - \tau^{2} \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot p_{l}^{t} \left( {x_{i} ;\tau } \right)\log \left( {p_{l}^{s} \left( {x_{i} ;\tau } \right)} \right). $$

(6)

In order to effectively extract the representation information from the intermediate features $\left\{ {F_{l}^{T} } \right\}_{l = 1}^{L}$, this paper constructs the distillation loss using Attention Transfer [10] as shown in Eq. (7),

$$ L_{att} = \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot L_{AT} \left( {{\text{F}}_{l}^{T} \left( {x_{i} } \right),{\text{ F}}_{l}^{S} \left( {x_{i} } \right)} \right), $$

(7)

however, it becomes increasingly challenging to identify learning difficulties due to noise interference from the teacher as the student's learning process progresses. Therefore, we introduce the VDMS to minimize the impact of noises, and the AT-based distillation loss with VDMS is illustrated in Eq. (8),

$$L_{att\_vdms} = \mathop \sum \limits_{i = 1}^{B} \mathop \sum \limits_{l = 1}^{L} w_{l - 1}^{i} \cdot \left( {\frac{{\left( {t_{l}^{T} \left( {x_{i} } \right) - \mu_{l}^{S} \left( {x_{i} } \right)} \right)^{2} }}{{2\sigma_{l}^{2} \left( {x_{i} } \right)}} + \log \sigma_{l}^{2} \left( {x_{i} } \right)} \right), $$

(8)

where $t_{l}^{T} \left( \cdot \right)$ represents the teacher’s attention map after imposing attention map** to intermediate feature ${\text{F}}_{l}^{T} \left( \cdot \right)$ as implemented in Attention Transfer [${\text{F}}_{l}^{S} \left( \cdot \right)$, and $\sigma_{l}^{2} \left( \cdot \right)$ represents the variance of the attention map.

The above two kinds of distillation losses focus solely on individual-instance knowledge distillation. However, to explore the relationship between different individuals in classifying dermatology images, we present a relationship-based distillation loss. Firstly, we extract the logits features of $B$ dermatology samples from each auxiliary classifier of the teacher model and construct the correlation matrix $\left\{ {R_{l}^{t} \left( \cdot \right)} \right\}_{l = 1}^{L}$. Similarly, we construct the correlation matrix $\left\{ {R_{l}^{s} \left( \cdot \right)} \right\}_{l = 1}^{L}$ for the logits features of $B$ dermatology samples from each auxiliary classifier of the student model. Then, we calculate the Mean Squared Error (MSE) loss between the correlation matrices of the student and teacher to obtain the relationship-based distillation loss, as shown in Eq. (9),

$$ L_{rkd} = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot L_{mse} \left( {R_{l}^{t} ,{ }R_{l}^{s} } \right) = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot \left( {R_{l}^{t} - R_{l}^{s} } \right)^{2} , $$

(9)

the relationship-based distillation loss with VDMS is expressed in Eq. (10), where $\sigma_{l}^{2}$ represents the variance of the correlation matrix,

$$ L_{rkd\_vdms} = \mathop \sum \limits_{l = 1}^{L} \alpha_{l} \cdot \left( {\frac{{\left( {R_{l}^{t} - R_{l}^{s} } \right)^{2} }}{{2\sigma_{l}^{2} }} + \log \sigma_{l}^{2} } \right). $$

(10)

Total Loss: The total loss of the student model includes the task loss for true-label supervision, the AdaBoost loss for true-label supervision, and the weighted distillation loss with sample-level weights between the student and the teacher, as shown in Eq. (11),

$$ L_{ total} = L_{ce} + \lambda_{1} L_{ada\_ce} + \lambda_{2} L_{mimic} , $$

(11)

where the weighted distillation loss with sample-level weights in skin lesion classification $L_{mimic}$ is represented by Eq. (12), where $\lambda_{1}$, $\lambda_{2}$,$\eta$, $\beta$, and $\gamma$ are balance factors. In the distillation loss, we set the temperature hyperparameter $\tau$ to 3.

$$\begin{aligned} L_{mimic} = L_{kd} + \eta L_{ada\_kd} + \gamma L_{att\_vdms} { } + \beta L_{rkd\_vdms}.\end{aligned}$$

(12)

Experiments

The experimental setup of this paper consists of three parts. Firstly, in Sect. "Comparison Experiment", we compare with related knowledge distillation frameworks to validate the effectiveness of our proposed VAdaKD. It should be noted that AdaKD represents our proposed AdaBoost-based KD with GCN, and VAdaKD represents our proposed AdaBoost-based KD with VDMS. Secondly, in Sect. "Ablation Study", the ablation experiments are designed to observe the impact of the number of auxiliary classifiers and to explore the validity of each component in the distillation loss. Lastly, in subSect. "Visualization", we explore the performance of VAdaKD in mining the learning difficulties at each stage by visualizing the correlation matrix for each stage of the student model and validate the multi-classification performance achieved by VAdaKD through t-distributed stochastic neighbor embedding (t-SNE) visualization.

Our experiments assess the model's performance using four evaluation indicators: Accuracy (Acc), Recall, F1-score, and AUC. Accuracy is a widely used metric in classification tasks that measures the overall classification performance of the model. However, achieving a high accuracy in highly imbalanced data may not be meaningful. AUC helps to make up for this shortcoming and better reflects the classifier's performance. Recall is crucial in the medical field as it indicates the correct classification rate among all diseased samples, essential for timely patient treatment. F1-score is a comprehensive metric that takes into account both Recall and Precision.

Comparison experiment

Extensive experiments are conducted on three benchmark datasets, including the Dermnet dataset, ISIC 2019 dataset, and HAM10000 dataset. The values of $\lambda_{1}$ and $\lambda_{2}$ in Eq. (11) are set to 0.5 and 0.3, respectively. In Eq. (12), the value of $\eta$ is set to 0.25. Furthermore, based on the experimental setup of AT [

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

Wang Y, Wang Y, Cai J, Lee TK, Miao C, Wang ZJ (2023) Ssd-kd: A self-supervised diverse knowledge distillation method for lightweight skin lesion classification using dermoscopic images. Med Image Anal 84:102693
Article Google Scholar
Khan MS, Alam KN, Dhruba AR, Zunair H, Mohammed N (2022) Knowledge distillation approach towards melanoma detection. Comput Biol Med 146:105581
Article Google Scholar
Elbatel M, Martí R, Li X (2024) FoPro-KD: Fourier Prompted Effective Knowledge Distillation for Long-Tailed Medical Image Recognition. IEEE Trans Med Imaging 43(3):954–965
Article Google Scholar
Adepu AK, Sahayam S, Jayaraman U, Arramraju R (2023) Melanoma classification from dermatoscopy images using knowledge distillation for highly imbalanced data. Comput Biol Med 154:106571
Article Google Scholar
Liu Q, Yu L, Luo L, Dou Q, Heng PA (2020) Semi-supervised medical image classification with relation-driven self-ensembling model. IEEE Trans Med Imaging 39(11):3429–3440
Article Google Scholar
Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Computer Vis 129(6):1789–1819
Article Google Scholar
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Comput Sci 14(7):38–39
Google Scholar
Zhao B, Cui Q, Song R, Qiu Y, Liang J (2022) Decoupled knowledge distillation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 11953–11962
Adriana R, Nicolas B, Ebrahimi KS, Antoine C, Carlo G, Yoshua B (2015) Fitnets: Hints for thin deep nets. Proc Int Conf Learn Represent 2(3):1
Google Scholar
Komodakis N, Sergey Z (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1612.03928
Kim J, Park S, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Adv Neural Inform Process Syst. https://doi.org/10.48550/ar**v.1802.04977
Yang C, An Z, Cai L, Xu Y (2021) Hierarchical self-supervised augmented knowledge distillation. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/168
** X, Peng B, Wu Y, Liu Y, Liu J, Liang D, Yan J, Hu X (2019) Knowledge distillation via route constrained optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1345–1354
Passalis N, Anastasios T (2018) Learning deep representations with probabilistic knowledge transfer. In: Proceedings of the European Conference on Computer Vision, pp. 268–284
Park W, Kim D, Lu Y, Cho M (2019) Relational knowledge distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3967–3976
Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1365–1374
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4133–4141
Lee SH, Kim DH, Song BC (2018) Self-supervised knowledge distillation using singular value decomposition. In: Proceedings of the European conference on computer vision, pp. 335–350
Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7096–7104
Passalis N, Tzelepi M, Tefas A (2020) Heterogeneous knowledge distillation using information flow modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2339–2348
Peng B, ** X, Liu J, Li D, Wu Y, Liu Y, Zhang Z (2019) Correlation congruence for knowledge distillation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5007–5016
Zhang R, Yu Y, Shen J, Cui X, Zhang C (2023) Local boosting for weakly-supervised learning. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3364–3375
Zhang R, Yu Y, Shetty P, Song L, Zhang C (2022) Prboost: prompt-based rule discovery and boosting for interactive weakly-supervised learning. https://doi.org/10.18653/v1/2022.acl-long.55. ar**v preprint ar**v:2203.09735
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Int Conf Mach Learn 96:148–156
Google Scholar
Baig MM, Awais MM, El-Alfy ESM (2017) AdaBoost-based artificial neural network learning. Neurocomputing 248:120–126
Article Google Scholar
Gao Y, Rong W, Shen Y, **ong Z (2016) Convolutional neural network based sentiment analysis using Adaboost combination. In: International Joint Conference on Neural Networks, pp. 1333–1338
Taherkhani A, Cosma G, McGinnity TM (2020) AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366
Article Google Scholar
Yang S, Chen LF, Yan T, Zhao YH, Fan YJ (2017) An ensemble classification algorithm for convolutional neural network based on AdaBoost. In: International Conference on Computer and Information Science, pp. 401–406
Shakeel PM, Tolba A, Al-Makhadmeh Z, Jaber MM (2020) Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Comput Appl 32(3):777–790
Article Google Scholar
Sun K, Zhu Z, Lin Z (2019) Adagcn: Adaboosting graph convolutional networks into deep models. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1908.05081
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. In: Proceedings of International Conference on Learning Representations. https://doi.org/10.48550/ar**v.1609.02907
Zhou H, Song L, Chen J, Zhou Y, Wang G, Yuan J, Zhang Q (2021) Rethinking soft labels for knowledge distillation: a bias-variance tradeoff perspective. Proc Int Conf Learn Represent. https://doi.org/10.48550/ar**v.2102.00650
Li Z, Li X, Yang L, Zhao B, Song R, Luo L, Yang J (2023) Curriculum temperature for knowledge distillation. Proc AAAI Conf Artif Intell 37(2):1504–1512
Google Scholar
Huang F, Ash J, Langford J, Schapire R (2018) Learning deep resnet blocks sequentially using boosting theory. In: International Conference on Machine Learning, pp. 2058–2067
Guo Z, Yan H, Li H, Lin X (2023) Class attention transfer based knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11868–11877
Oakley, A. DermNet New Zealand. Topical formulations. Updated February.
Tschandl P, Rosendahl C, Kittler H (2018) The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 5(1):1–9
Article Google Scholar
Codella NC, Gutman D, Celebi ME, Helba B, Marchetti MA, Dusza SW, Halpern A (2018) Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: International Symposium on Biomedical Imaging, 168–172
Combalia M, Codella NC, Rotemberg V, Helba B, Vilaplana V, Reiter O, Malvehy J (2019) Bcn20000: dermoscopic lesions in the wild. https://doi.org/10.48550/ar**v.1908.02288.ar**v preprint ar**v:1908.02288
Hossain MI, Elahi MM, Ramasinghe S, Cheraghian A, Rahman F, Mohammed N, Rahman S (2023) LumiNet: the bright side of perceptual knowledge distillation.https://doi.org/10.48550/ar**v.2310.03669. ar**v preprint ar**v:2310.03669
Huang T, You S, Wang F, Qian C, Xu C (2022) Knowledge distillation from a stronger teacher. Adv Neural Inf Process Syst 35(33716):33727
Google Scholar
Shu C, Liu Y, Gao J, Yan Z, Shen C (2021) Channel-wise knowledge distillation for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320
Haberman HF, Norwich KH, Diehl DL, Evans SJ, Harvey B, Landau J, Zingg W (1985) DIAG: a computer-assisted dermatologic diagnostic system—clinical experience and insight. J Am Acad Dermatol 12(1):132–143
Article Google Scholar
Brooks GJ, Ashton RE, Pethybridge RJ (1992) DERMIS: a computer system for assisting primary-care physicians with dermatological diagnosis. Br J Dermatol 127(6):614–619
Article Google Scholar
Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, Coz D (2020) A deep learning system for differential diagnosis of skin diseases. Nat Med 26(6):900–908
Article Google Scholar
Hameed SA, Haddad A, Nirabi A (2019) Dermatological diagnosis by mobile application. Bull Elect Eng Inform 8(3):847–854
Article Google Scholar
Joshi SK (2023) Chaos embedded opposition based learning for gravitational search algorithm. Appl Intell 53(5):5567–5586
Google Scholar
Wang Y, Yu Y, Gao S, Pan H, Yang G (2019) A hierarchical gravitational search algorithm with an effective gravitational constant. Swarm Evolut Comput 46:118–139
Article Google Scholar
Wang Y, Gao S, Yu Y, Cai Z, Wang Z (2021) A gravitational search algorithm with hierarchy and distributed framework. Knowl Based Syst 218:106877
Article Google Scholar
Mohammadi A, Sheikholeslam F, Mirjalili S (2023) Nature-inspired metaheuristic search algorithms for optimizing benchmark problems: inclined planes system optimization to state-of-the-art methods. Arch Comput Methods Eng 30(1):331–389
Article Google Scholar
Bolotnik N, Figurina T (2023) Controllabilty of a two-body crawling system on an inclined plane. Meccanica 58(2):321–336
Article MathSciNet Google Scholar
Song X, Song Y, Stojanovic V, Song S (2023) Improved dynamic event-triggered security control for T-S fuzzy LPV-PDE systems via pointwise measurements and point control. Int J Fuzzy Syst 25(8):3177–3192
Article Google Scholar
Zhang Z, Song X, Sun X, Stojanovic V (2023) Hybrid-driven-based fuzzy secure filtering for nonlinear parabolic partial differential equation systems with cyber attacks. Int J Adap Control Signal Process 37(2):380–398
Article MathSciNet Google Scholar
Zhang X, He S, Stojanovic V, Luan X, Liu F (2021) Finite-time asynchronous dissipative filtering of conic-type nonlinear Markov jump systems. Sci China Inform Sci 64(5):152206
Article MathSciNet Google Scholar

Download references

Acknowledgements

We express our gratitude to the High-Performance Laboratory of the School of Information Engineering at Jiangxi University of Science and Technology for providing the computational resources. We would also like to thank all the teachers and students in this academic group for their valuable suggestions. Additionally, we appreciate the contributions of other researchers who helped adjust the ideas, designs, and experiments in this paper.

Funding

This research is supported in part by the Jiangxi Provincial Natural Science Foundation under grants 20224BAB212013, 20224BAB212008, and 20224BAB202002, in part by the National Natural Science Foundation of China under grants 62266020 and 62261027, and in part by the Science and Technology Research Project of Jiangxi Provincial Department of Education under grants GJJ2200830 and GJJ190467.

Author information

Authors and Affiliations

School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China
**angchun Yu, Guoliang **ong, Jianqing Wu, Jian Zheng, Miaomiao Liang, Lingjuan Yu & Qing Xu
Jiangxi Province Key Laboratory of Multidimensional Intelligent Perception and Control, Jiangxi, China
**angchun Yu, Guoliang **ong, Jianqing Wu, Jian Zheng, Miaomiao Liang, Lingjuan Yu & Qing Xu
Department of Ganzhou Cancer Hospital, Ganzhou, 341000, China
Liu** Qiu

Authors

**angchun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang **ong
View author publications
You can also search for this author in PubMed Google Scholar
Jianqing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Liang
View author publications
You can also search for this author in PubMed Google Scholar
Liu** Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Lingjuan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianqing Wu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yu, X., **ong, G., Wu, J. et al. Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images. Complex Intell. Syst. (2024). https://doi.org/10.1007/s40747-024-01501-4

Download citation

Received: 30 November 2023
Accepted: 16 May 2024
Published: 22 June 2024
DOI: https://doi.org/10.1007/s40747-024-01501-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images

Abstract

Similar content being viewed by others

Masked autoencoders with generalizable self-distillation for skin lesion segmentation

EPVT: Environment-Aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering

Introduction

Related work

Knowledge distillation

Methodology

Framework design

Pre-training Teacher Model

Training student model

Experiments

Comparison experiment

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variational AdaBoost knowledge distillation for skin lesion classification in dermatology images

Abstract

Similar content being viewed by others

Masked autoencoders with generalizable self-distillation for skin lesion segmentation

EPVT: Environment-Aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Towards Novel Class Discovery: A Study in Novel Skin Lesions Clustering

Introduction

Related work

Knowledge distillation

Methodology

Framework design

Pre-training Teacher Model

Training student model

Experiments

Comparison experiment

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation