TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning

Wang, Shuangmei; Cao, Yang; Wu, Tieru

doi:10.1007/s11063-024-11605-0

TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning

Open access
Published: 08 May 2024

Volume 56, article number 167, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Processing Letters Aims and scope Submit manuscript

TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning

Download PDF

Shuangmei Wang¹,
Yang Cao¹ &
Tieru Wu^1,2

241 Accesses
Explore all metrics

Abstract

Few-shot class-incremental learning (FSCIL) struggles to incrementally recognize novel classes from few examples without catastrophic forgetting of old classes or overfitting to new classes. We propose TLCE, which ensembles multiple pre-trained models to improve separation of novel and old classes. Specifically, we use episodic training to map images from old classes to quasi-orthogonal prototypes, which minimizes interference between old and new classes. Then, we incorporate the use of ensembling diverse pre-trained models to further tackle the challenge of data imbalance and enhance adaptation to novel classes. Extensive experiments on various datasets demonstrate that our transfer learning ensemble approach outperforms state-of-the-art FSCIL methods.

Knowledge Representation by Generic Models for Few-Shot Class-Incremental Learning

Flexible few-shot class-incremental learning with prototype container

Article 06 February 2023

Long-Tailed Class Incremental Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Deep learning has sparked substantial advancements in various computer vision tasks. These advancements are mainly due to the emergence of large-scale datasets and powerful GPU computing devices. However, deep learning-based methods exhibit limitations in recognizing classes which have not been incorporated into their training. In this scenario, there has been significant researches conducted on Class-Incremental Learning (CIL), which focuses on dynamically updating the model using only new samples from each additional task, while preserving knowledge about previously learned classes. On the other hand, the process of obtaining and annotating a sufficient quantity of data samples presents challenges in both complexity and expense. Certain studies are dedicated to investigating CIL in situations where data availability is limited. Specifically, researchers have explored the concept of few-shot class-incremental learning (FSCIL), which aims to learn new classes continuously by using only a limited number of target samples.

As a consequence, two issues arise: the potential for catastrophic forgetting of previously learned classes and the risk of overfitting to new concepts. Furthermore, Constrained Few-Shot Class-Incremental Learning (C-FSCIL) [1] introducethat this particular learning approach abides by explicit constraints related to memory and computational capacity. These constraints include the necessity to maintain a consistent computational cost when acquiring knowledge about a new class and ensuring that the model’s memory usage increases at most linearly as additional classes are introduced.

To solve the problems mentioned above, recent studies [2,3,4] have been focused on placing emphasis on acquiring transferable features. They use cross-entropy (CE) loss during base session training and freeze the backbone later to help with adapting to new classes. C-FSCIL [1] employs meta-learning to map input images to quasi-orthogonal prototypes in a way that minimizes interference between the prototypes of different classes. Although C-FSCIL has demonstrated superior performance, we find a prediction bias arising from class imbalance and data imbalance. We also observe that the process of assigning hyperdimensional quasi-orthogonal vectors to each class demands a substantial number of samples and iterations. This undoubtedly presents a challenge when it comes to allocating prototypes to novel classes that possess only a limited amount of samples.

In this paper, we propose TLCE, a transfer-learning based few-shot class-incremental learning method that ensembles various classifiers to memorize different knowledge. One of the main inspirations for few-shot classification is to pre-train a deep network on the base dataset and transfer the knowledge gained to the novel classes [5, 6]. This approach has been demonstrated as a strong baseline for improving few-shot classification accuracy. On the other hand, little interference between the new classes and the old classes is key. Hence, we leverage the advantages offered by the aforementioned classifiers through ensemble learning. Firstly, we employ meta-learning to train a robust hyperdimensional network (RHD) according to C-FSCIL. This allows us to effectively map input images to quasi-orthogonal prototypes for base classes. Secondly, we integrate cosine similarity and cross-entropy loss to train a transferable knowledge network (TKN). Finally, we compute the prototype, i.e., the average of features, for each class. The classification of a test sample is simply determined by finding its nearest prototype measured by the weighted integration combines the different relationships.

Comparing to C-FSCIL, our TLCE adopts the similar idea of assigning quasi-orthogonal prototypes for base classes to reduce minimal interference. The key difference is the attempt to perform well on all classes equally, regardless of the training sequence employed through classifier ensembles. We conduct extensive comparisons with state-the-art few-shot class-incremental classificaiton methods on miniImageNet [7] and CIFAR100 [8] and the results demonstrate the superiority of our TLCE. Ablation studies on different ensembles, i.e., different weights between the robust hyperdimensional network and transferable knowledge network also show the necessity of ensembling two classifiers for better results.

In summary, our contributions are as follows:

1.
We propose TLCE, transfer-learning based classifier ensembles to improve the novel class set separation and maintain the base class set separation.
2.
The method we propose can efficiently explore the relationship between prototypes and test features without additional training or expensive computation. This approach not only achieved a higher average accuracy but also demonstrated significant improvements in the accuracy of new classes.
3.
We conduct extensive experiments on various datasets and the results show our efficient method can outperform SOTA few-shot class-incremental classification methods.

2 Related Work

2.1 Few-Shot Learning

FSL seeks to develop neural models for new categories using only a small number of labeled samples. Meta-learning [9] is extensively utilized to accomplish few-shot classification. The core idea is to use the episodic training paradigm to learn generalizable classifiers or feature extractors for the data of the base classes in an optimization-based framework [10,11,12], as well as learn a distance function to measure the similarity among feature embeddings through metric-learning [13,14,15,27,28,29]. On the other hand, several methods aim to find bias and rectify them like the oracle model [19, 30, 31]. FSCIL can be seen as a particular case of the CIL. Therefore, we can learn from some of the methods mentioned above.

2.3 Few-Shot Class-Incremental Learning

FSCIL introduces few-shot scenarios where only a few labeled samples are available into the task of class-incremental learning. In order to tackle the challenges of catastrophic forgetting and seriously overfitting in FSCIL, researchers have proposed various approaches from different perspectives. TOPIC [32] employs a neural gas network to preserve the topology of the feature manifold from a cognitive-inspired perspective. SKD [33] and ERDIL [34] use knowledge distillation to balance the preserving of old-knowledge and adaptation of new-knowledge. Feature-space based methods focus on obtaining compact clustered features and maintaining generalization for future incremental classes [35,36,37]. From the perspective of parameter space, WaRP [38] combines the advantages of F2M [4] to find flat minimums of the loss function and FSLL [39] for parameter fine-tuning. They push most of the previous knowledge compactly into only a few important parameters so that they can fine-tune more parameters during incremental sessions. From the perspective of hybrid approaches, some works combine episodic training [1, 3, 40], ensemble learning [41, 42], and so on. C-FSCIL [1] maps input images to quasi-orthogonal prototypes such that the prototypes of different classes encounter small interference through episodic training. However, achieving quasi-orthogonality among all prototypes for the classes poses difficulties when dealing with novel classes that have only a limited number of labeled samples. MCNet [41] trains multiple embedding networks using diverse network architectures to enhance the diversity of models and enable them to memorize different knowledge effectively. Similar to the above method, our method is based on ensemble learning, while we train two shared architecture networks by using different loss function and training methods. [42] enhances the expression ability of extracted features through multistage pre-training and uses meta-learning process to extract meta-feature as complementary features. Please note that a novel generalization model is one with no overlap** among novel class sets and no interference with base classes. In contrast to these methods, we ensemble a robust hyperdimensional (HD) network for base classes and a trasnferable knowledge network for novel classes from a whole new perspective.

3 Method

In this section, we propose the FSCIL method using model ensemble.

An ideal FSCIL learning model should ensure that the newly added categories do not interfere with the old ones and maintain a distinct separation between them. The motivations mentioned above prompt us to solve the aforementioned problems by combining a robust hyperdimensional memory-augmented neural network and a transferable knowledge model through ensemble. Firstly, we draw inspiration from [1, 43] and employ episodic training to map the base datasets to quasi-orthogonal prototypes, thereby minimizing interference of base classes during incremental sessions. Secondly, we pretrain a model from scratch in a standard supervised way to gain transferable knowledge space. Finally, we have integrated explicit memory (EM) into the previously mentioned embedding networks. This has been done in a manner that allows the EM to store the embeddings of labeled data samples as class prototypes within its memory. During the testing process, we utilize the nearest prototype classification method based on similarity thereby meeting the classification requirements for all seen classes. Note that we only need to compute the new class prototypes using the aforementioned models and update the EM because training only takes place within the base session. Figure 1 demonstrates the framework of our method. In the following, we provide technical details of the proposed method for few-shot class-incremental classification.

3.1 Problem Statement

FSCIL is a challenging machine learning problem where we continuously learn from a series of tasks. Assume that we have a continuous flow of labeled training data named $D^1, D^2, \dots , D^T$. Each $D^t$ represents a specific task and contains pairs of input data and corresponding labels, which we denote as $D^t= \{(x_i, y_i)\}_{i=1}^{|D^t|}$. The first task, $D^1$, consists of a large amount of data and is known as the base session. The following tasks, $D^t$ where $t>1$, involve smaller training sets that focus on new classes. We refer to these subsequent tasks as the incremental session. In each task, the set of labels is denoted as $C^t$, and it is important to note that for any two different tasks, the sets of labels do not overlap ($ C^i \cap C^j= \emptyset $ when $i \ne j$). The total number of classes in each task is represented by $|C^t |$. We follow the conventional few-shot class-incremental learning setting. This means that we create a series of datasets, each containing N-way K-shot samples. For example, $D^t$ consists of N novel classes, and each class has K labeled samples. During each session t, the model only has access to the dataset $D^t$ for training. After training with $D^t$, the model needs to be able to recognize all the classes encountered so far, which are the union of all the sets of labels from the previous tasks ( $\cup _{s\le t}C^s$).

3.2 TLCE

We suggest using a combination of a robust hyper-dimensional network and a transferable knowledge network in our transfer-learning based classifier ensembles. Utilizing this method, we can leverage the diverse priors acquired from the base classes to explore the comprehensive relation between prototypes and test features.

3.2.1 Robust Hyperdimensinal Network (RHD)

Due to the “curse" of dimensionality, a randomly selected vector has a high probability of being quasi-orthogonal to other random vectors. As a result, when representing a novel class, the process not only contributes incrementally to previous learning but also causes minimal interference. Hence, we follow C-FSCIL [1] to build a RHD network during the base session.

Our method is comprised of three primary components: a backbone network, an extra projection, and a fully connected layer. The backbone network maps the samples from the input domain $\mathcal {X}$ to a feature space. To construct an embedding network that employs a high-dimensional distributed representation, we combine the backbone network with a projection layer. Then we have

$$\begin{aligned} \mu _1 = F_{\theta _1} (x ), \quad \mu _2 = G_{\theta _2}(\mu _1), \end{aligned}$$

(1)

where $\mu _1 \in R^{d_f}$ is the intermediate feature of input x, $d_f $ is the dimension of the feature space, $\mu _2 \in R^{d}$ is the output feature of the intermediate feature $\mu _1$, and $\theta _1,\theta _2$ are the learnable parameters of the backbone network and the projection layer, respectively.

Firstly, we jointly train both $F_{\theta _1}$ and $G_{\theta _2}$ from scratch in the standard supervised classification using the base session data to derive powerful embeddings for the downstream base learner. The empirical risk to minimize can be formulated as:

$$\begin{aligned} \min _{\theta _1, \theta _2} \quad L_{ce} ((W^T\mu _2), y), \end{aligned}$$

(2)

where $L_{ce} ( \cdot , \cdot )$ is cross-entropy loss (CE) and $W^T$ is the learnable parameters of the fully connected layer.

Lastly, we build on top of the meta-learning setup to allocate nearlyquasi-orthogonal vectors to various image classes. These vectors are then positioned far away from each other in the hyperdimensional space. We replace the fully connected layer with the EM and build a series of $|D^1 |$-way K-shot tasks where $|D^1 |$ is the number of base classes and K is the number of support samples in each task.In each task, the projection layer generates a support vector for every training input. To represent each class, we calculate the average of all support vectors that belong to that specific class. This process allows us to generate a single prototype vector for each class. Within the EM, prototypes are saved for each class. Specifically, the prototype for a given class i is determined in the following manner:

$$\begin{aligned} p_i^R = \frac{1}{ |\textbf{S}_i |}\sum _{x \in \textbf{S}_i} G_{\theta _2}(F_{\theta _1}(x)), \end{aligned}$$

(3)

where $\textbf{S}_i$ is the set of all samples from class i and $ |S_i |$ is the number of samples. Given a query sample q and a prototype $p_i^R$, we compute the cosine similarity for class i as follows:

$$\begin{aligned} S_i^R = cos(tanh(G_{\theta _2}(F_{\theta _1}(q)), tanh(p_i^R)), \end{aligned}$$

(4)

where $\tanh (\cdot )$ is the hyperbolic tangent function and $cos(\cdot , \cdot )$ is the cosine similarity. In hyperdimensional memory-augmented neural networks [43], the hyperbolic tangent has demonstrated its usefulness as a non-linear function due to its ability to regulates the activated prototypes’ norms and embedding outputs. Additionally, cosine similarity tackles the norm and bias problems commonly encountered in FSCIL by emphasizing the angle between activated prototypes and embedding outputs while disregarding their norms [44]. Given the cosine similarity score $S_i^R$ for every class i, we utilize a soft absolute sharpening function to enhance this attention vector, resulting in quasi-orthogonal vectors [43].

Softabs attention The softabs attention function is defined as

$$\begin{aligned} h(S_i^R) = \frac{\epsilon (S_i^R)}{\sum _{j=1}^{|D^1|} \epsilon (S_j^R)}, \end{aligned}$$

(5)

where $\epsilon (\cdot )$ is the sharpening function:

$$\begin{aligned} \epsilon (c) = \frac{1}{1+e^{-(\beta (c-0.5))}} + \frac{1}{1+e^{-(\beta (-c-0.5))}}. \end{aligned}$$

(6)

The sharpening function includes a stiffness parameter $\beta $, which is set to 10 as in [43].

3.2.2 Transferable Knowledge Network (TKN)

It is difficult to ensure quasi-orthogonality among all prototypes for each class due to the presence of novel classes that only have a small number of labeled samples. To address this issue, we draw inspiration from transfer learning-based few-shot methods and explore various transferable models. The most straightforward approach involves utilizing a model that has been pre-trained from the scratch by using standard supervised classification techniques. We employ this model as a baseline for our analysis.

In SimpleShot [45], it demonstrates that using nearest neighbor classification, where features are simply normalized by L2 norm and measured by Euclidean distance, can obtain competitive results in few-shot classification tasks. The squared Euclidean distance after L2 normalization is equivalent to cosine similarity. Utilizing cosine similarity as a distance metric for quantifying data similarity has two implications: 1) during training, it focuses on the angles between normalized features rather than the absolute distances within the latent feature space; 2) the normalized weight parameters of the fully connected layer can be interpreted as the centroids or centers of each category [36]. So we combine cosine similarity with cross-entropy loss to train a more transferable network. To simplify calculations of cosine similarity in the final fully connected layer, we set the bias to zero. Then the data prediction procedure can be written as:

$$\begin{aligned} \mu _2= & {} G_{\theta _2}(\mu _1) = G_{\theta _2}(F_{\theta _1}(x)),\end{aligned}$$

(7)

$$\begin{aligned} w_i= & {} W_i^T \mu _2 = \Vert W_i\Vert \Vert \mu _2\Vert \cos (\theta _i) = \cos (\theta _i), \nonumber \\{} & {} \quad \Vert W_i\Vert = \Vert \mu _2\Vert = 1. \end{aligned}$$

(8)

The quantity $w_i$ is the calculated cosine similarity between the feature $\mu _2$ and the weight parameter $W_i$ for class i. The loss function is given by:

$$\begin{aligned} \begin{aligned} L&= -\frac{1}{T}\sum _{j=1}^{T}\log (\frac{e^{y_{j}}}{\sum ^{|C^1 |}_{i=1}e^{w_i}})\\&= -\frac{1}{T}\sum _{j=1}^{T}\log (\frac{e^{\Vert W_{j}\Vert \Vert \mu _2\Vert \cos (\theta _{j})}}{\sum _{i=1}^{|C^1 |}e^{\Vert W_i\Vert \Vert \mu _2\Vert \cos (\theta _i)}})\\&= -\frac{1}{T}\sum _{j=1}^{T}\log (\frac{e^{\cos (\theta _{j})}}{\sum _{i=1}^{|C^1 |}e^{\cos (\theta _i)}}),\\ \end{aligned} \end{aligned}$$

(9)

where T is the number of training images and the quantity $y_j$ describes the cosine similarity towards its ground truth class for image j.

3.3 Incremental Test

By employing the incremental-frozen framework, we can reduce the storage requirements by only preserving the prototypes of all the encountered classes and updating the exemplar memory (EM) when introducing new classes. This way, we can effectively manage the limitations imposed by memory and computational capacities. Firstly, we utilize the robust hyperdimensional network and transferable knowledge network to calculate the prototypes $P^R$ and $P^T$. Once we acquire the prototypes for the novel classes, we can promptly update the EM. It is important to note that the EM does not update the prototypes of the old classes, as RHD and TKN remain fixed in the subsequent session. Then, we save all the prototypes for the classes that have been appeared so far within the EM. Finally, we can derive the ultimate classification outcome by evaluating the similarity measure between the test sample and each prototype. Suppose we have a test sample q. According to Eq. 4, we can calculate separate similarities, denoted as $S^R$ and $S^T$, for each classifier, namely RHD and TKN, individually. Then, we can combine classifiers through weighted integration by considering both scores to obtain the final score S as:

$$\begin{aligned} \begin{aligned}&S = (1-\lambda ) *S^R + \lambda S^T, \end{aligned} \end{aligned}$$

(10)

where $\lambda \in [0, 1]$ is the hyperparameter. This approach allows us to leverage the strengths of multiple classifiers and intelligently merge their outputs, leading to a more accurate final result.

4 Experiments

In this section, we conduct quantitive comparisons between our TLCE and state-of-the-art few-shot class-incremental learning methods on two representative datasets. We also perfrom ablation studies on evaluating design choices and different hyperparameters for our methods.

Table 1 Quantitative comparison on the test set of miniImageNet in the 5-way 5-shot FSCIL setting

Full size table

Table 2 Quantitative comparison on the test set of CIFAR100 in the 5-way 5-shot FSCIL setting

Full size table

4.1 Datasets

We evaluate our proposed method on two datasets for benchmarking few-shot class-incremental learning: miniImageNet [7] and CIFAR100 [8].

In the miniImageNet [7] dataset, there are 100 classes, with each class having 500 training images and 100 testing images. As for CIFAR100 [8], it is a challenging dataset with 60,000 images of size 32–32, divided into 100 classes. Each class has 500 training images and 100 testing images. Following the split used in [32], we select 60 base classes and 40 novel classes from CIFAR100 and miniImageNet. These 40 novel classes are further divided into eight incremental sessions. In each session, we learn by using a 5-way 5-shot approach, which means training on 5 classes with 5 images per class.

4.2 Implementation Details

For miniImageNet and CIFAR100, we use ResNet-12 following C-FSCIL [1]. We train the TKN with the SGD optimizer, where the learning rate is 0.01, the batch size is set as 128 and epoch is 120. As for the RHD network, it is pretrained by the C-FSCIL [1] work. For each image in the dataset, we represent it as a 512-dimensional feature vector. The hyperparameter $\lambda $ is set to 0.8 for both the miniImageNet and the CIFAR100 dataset.

4.3 Comparison and Evaluation

In order to evaluate the effectiveness of our TLCE, we first conduct quantitative comparisons with several representative and state-of-art few-shot class-incremental learning methods. However, it is important to note that an improvement does not necessarily imply an improvement in both base and novel performances individually. Then, we conduct further analysis of model performance from both perspectives base and novel to delve deeper into the performance improvement. Furthermore, our method offers the advantages of requiring no additional training and consuming minimal storage space.

Quantitative comparisons. As there are numerous efforts have been paid to the few-shot class-incremental learning, we mainly compare our TLCE with representative and SOTA works. The compared methods include CIL methods [27, 28, 46] and FSCIL methods [1,2,3,4, 32, 33]. For C-FSCIL [1], we only compare with their basic version and do not take their model requiring additional training during incremental sessions into consideration.

For our method, we report our best results with the value of $\lambda $ set to 0.8. Tables 1 and 2 show the quantitative comparison results on two datasets. It can be seen that our best results outperform the other methods. In particular, we consider the different transferable knowledge models. For the baseline, we train the model in the standard supervised classification. For TLCE, we integrate cosine metric with cross entropy to train the model. It can be seen that the latter one can significantly enhance the performance of the ensemble classifiers. It is worth mentioning that both the Baseline and TLCE methods do not incur any additional training cost when it comes to recognizing new classes. The only time cost involved is in the process of feature extraction. This means that these methods are efficient and do not require extensive retraining when dealing with novel classes, making them practical and convenient for real-world applications.

From Tables 1 and 2, it can be deserved that C-FSCIL performs more effectively in the first five incremental sessions, while the effectiveness is slight in the last four incremental sessions. We make further analysis from the perspective of the accuracy on base and novel classes, respectively. We use the term “weighted performance" to describe the commonly used average accuracy across all classes. This is because the base classes, which make up a significant portion of all the classes, have a greater influence on the overall performance. According to the data shown in Fig. 2, we can observe a slight decrease in the base because the base classes take a large percentage of all classes performance. This indicates that C-FSCIL could resist the knowledge forgetting. However, the novel performance on the following incremental sessions is poor. In contrast, an ideal FSCIL classifier will have equally high performance on both novel and base classes. For our method TLCE, it is evident that while there is a decrease in the base classes, there is a significant improvement in the novel and weighted performance. Based on these analyses, we can conclude that our method not only achieved a higher average accuracy but also showed notable improvements in the accuracy of new classes. In the ablation study, we perform more experiments and analysis of different $\lambda $ values to reveal which degree of RHD and TKN is more suitable for the dataset.

4.4 Ablation Study

In this section, we perform ablation studies to verify the design choices of our method and the effectiveness of different modules. First, we conduct experiments with different hyperparameter $\lambda $ to observe how the RHD and TKN can affect the final results. Then, we perform the study on the effectiveness of different ensemble classifiers.

Table 3 The ablation study on value selection of hyperparameter $\lambda $

Full size table

Effect on different hyperparameter $\lambda $. Different $\lambda $ values correspond to different degrees of RHD and TKN applied to the input data. From the results in Table 3, it can be found when the TKN does not work ($\lambda = 0.0$), the result is lower. But with the ensemble of TKN, the result shows a convex curve with different $\lambda $. That indicates the importance of the TKN.

Table 4 The effect of various components of ensemble classifiers

Full size table

Effect on different ensemble classifiers. We conduct experiments on miniImageNet to verify the effectiveness of the ensemble classifiers. First, we pre-train the feature extractor in the standard supervised classification on the base dataset, which we refer to as base-TKN. Then, we enhance the network’s transferability during training by combining cosine similarity with cross-entropy loss, and we refer to this approach as cosine-TKN. We use the aforementioned two feature extractors as two different Transferable Knowledge Networks (TKN). The base-TKN and RHD serve as the baseline, while the cosine-TKN and RHD are combined to form TLCE. Since the performance of the final session in FSCIL is crucial, the subsequent analysis is primarily focused on it. The results are shown in Table 4. Notably, our method, incorporating all components, achieves the highest performance on the miniImageNet dataset. Compared with base-TKN and RHD, the baseline brings significant gains. This demonstrates the importance and effectiveness of classifier ensembles. Furthermore, we discover that integrating cosine-TKN with RHD can lead to further enhancement of model performance. This suggests that the combination of cosine-TKN and RHD is more complementary. As the ablation experimental results have shown, it is beneficial to combine classifiers during the incremental sessions to effectively address the FSCIL problem.

5 Conclusion

In this paper, we propose a simple yet effective framework, named TLCE, for few-shot class-incremental learning. Without any retraining and expensive computation during incremental sessions, our transfer-learning based ensemble classifiers method can efficiently alleviate the issues of catastrophic forgetting and overfitting. Extensive experiments show that our method can outperform SOTA methods. Investigating a more transferable network is worthy to explore in the future. Also, exploring a more general way to combine the classifiers is an interesting future work.

References

Hersche M, Karunaratne G, Cherubini G, Benini L, Sebastian A, Rahimi A (2022)Constrained few-shot class-incremental learning. In: IEEE/CVF conference on computer vision and pattern recognition, pp 9057–9067
Zhu K, Cao Y, Zhai W, Cheng J, Zha Z-J (2021) Self-promoted prototype refinement for few-shot class-incremental learning. In: IEEE/CVF conference on computer vision and pattern recognition, pp 6801–6810
Zhang C, Song N, Lin G, Zheng Y, Pan P, Xu Y (2021) Few-shot incremental learning with continually evolved classifiers. In: IEEE/CVF conference on computer vision and pattern recognition, pp 12455–12464
Shi G, Chen J, Zhang W, Zhan L-M, Wu X-M (2021) Overcoming catastrophic forgetting in incremental few-shot learning by finding flat minima. Adv Neural Inf Process Syst 34:6747–6761
Google Scholar
Chen W-Y, Liu Y-C, Kira Z, Wang Y-CF, Huang J-B (2019) A Closer Look at Few-shot Classification. In: The international conference on learning representations
Tian Y, Wang Y, Krishnan D, Tenenbaum JB, Isola P (2020) Rethinking few-shot image classification: a good embedding is all you need? In: The European conference on computer vision, Springer, pp 266–282
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Handbook of systemic autoimmune diseases, vol 1. University of Toronto, Toronto
Google Scholar
Hospedales T, Antoniou A, Micaelli P, Storkey A (2021) Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell 44(9):5149–5169
Google Scholar
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, PMLR, pp 1126–1135
Jamal MA, Qi G-J (2019) Task agnostic meta-learning for few-shot learning. In: IEEE conference on computer vision and pattern recognition
Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R (2019) Meta-learning with latent embedding optimization. In: The international conference on learning representations
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: The international conference on machine learning. Deep learning workshop
Vinyals O, Blundell C, Lillicrap T, Wierstra D et al (2016) Matching networks for one shot learning. In: Advances in neural information processing systems, vol 29
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, vol 30
Sung F, Yang Y, Zhang L, **ang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: IEEE conference on computer vision and pattern recognition, pp 1199–1208
Yang S, Liu L, Xu M (2021) Free lunch for few-shot learning: distribution calibration. In: The international conference on learning representations
Guo Y, Du R, Li X, **e J, Ma Z, Dong Y (2022) Learning calibrated class centers for few-shot classification by pair-wise similarity. IEEE Trans Image Process 31:4543–4555
Article Google Scholar
Xu J, Luo X, Pan X, Pei W, Li Y, Xu Z (2022) Alleviating the sample selection bias in few-shot learning by removing projection to the centroid. In: Advances in neural information processing systems
Wang S, Ma R, Wu T, Cao Y (2023) P3dc-shot: prior-driven discrete data calibration for nearest-neighbor few-shot classification. Image Vis Comput 136:104736
Article Google Scholar
Iscen A, Zhang J, Lazebnik S, Schmid C (2020) Memory-efficient incremental learning through feature adaptation. In: The European conference on computer vision, Springer, pp 699–715
Zhu F, Zhang X-Y, Wang C, Yin F, Liu C-L (2021) Prototype augmentation and self-supervision for incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 5871–5880
Petit G, Popescu A, Schindler H, Picard D, Delezoide B (2023) Fetril: feature translation for exemplar-free class-incremental learning. In: IEEE Winter Conference on Applications of Computer Vision, pp 3911–3920
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl Acad Sci 114(13):3521–3526
Article MathSciNet Google Scholar
Chaudhry A, Dokania PK, Ajanthan T, Torr PH (2018) Riemannian walk for incremental learning: understanding forgetting and intransigence. In: The European conference on computer vision, pp 532–547
Lee J, Hong HG, Joo D, Kim J (2020) Continual learning with extended kronecker-factored approximate curvature. In: IEEE conference on computer vision and pattern recognition, pp 9001–9010
Rebuffi S-A, Kolesnikov A, Sperl G, Lampert CH (2017) icarl: incremental classifier and representation learning. In: IEEE conference on computer vision and pattern recognition, pp 2001–2010
Castro FM, Marin-Jimenez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: The European conference on computer vision
Gao Q, Zhao C, Ghanem B, Zhang J (2022) R-dfcil: relation-guided representation learning for data-free class incremental learning. In: The European conference on computer vision, Springer, pp 423–439
Yu L, Twardowski B, Liu X, Herranz L, Wang K, Cheng Y, Jui S, Weijer JVD (2020) Semantic drift compensation for class-incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 6982–6991
Liu Y, Schiele B, Sun Q (2021) Adaptive aggregation networks for class-incremental learning. In: The European conference on computer vision, pp 2544–2553
Tao X, Hong X, Chang X, Dong S, Wei X, Gong Y (2020) Few-shot class-incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 12180–12189 . https://doi.org/10.1109/CVPR42600.2020.01220
Cheraghian A, Rahman S, Fang P, Roy SK, Petersson L, Harandi M (2021) Semantic-aware knowledge distillation for few-shot class-incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 2534–2543
Dong S, Hong X, Tao X, Chang X, Wei X, Gong Y (2021) Few-shot class-incremental learning via relation knowledge distillation. In: AAAI conference on artificial intelligence, vol 35, pp 1255–1263
Zhou D-W, Wang F-Y, Ye H-J, Ma L, Pu S, Zhan D-C (2022) Forward compatible few-shot class-incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 9046–9056
Peng C, Zhao K, Wang T, Li M, Lovell BC (2022) Few-shot class-incremental learning from an open-set perspective. In: The European conference on computer vision, Springer, pp 382–397
Song Z, Zhao Y, Shi Y, Peng P, Yuan L, Tian Y (2023) Learning with fantasy: semantic-aware virtual contrastive constraint for few-shot class-incremental learning. In: IEEE conference on computer vision and pattern recognition
Kim D-Y, Han D-J, Seo J, Moon J (2023) War** the space: weight space rotation for class-incremental few-shot learning. In: The international conference on learning representations
Mazumder P, Singh P, Rai P (2021) Few-shot lifelong learning. In: AAAI conference on artificial intelligence, vol 35, pp 2337–2345
Chi Z, Gu L, Liu H, Wang Y, Yu Y, Tang J (2022) Metafscil: a meta-learning approach for few-shot class incremental learning. In: IEEE conference on computer vision and pattern recognition, pp 14166–14175
Ji Z, Hou Z, Liu X, Pang Y, Li X (2023) Memorizing complementation network for few-shot class-incremental learning. IEEE Trans Image Process 32:937–948
Article Google Scholar
Xu X, Wang Z, Fu Z, Guo W, Chi Z, Li D (2023) Flexible few-shot class-incremental learning with prototype container. Neural Comput Appl 35(15):10875–10889
Article Google Scholar
Karunaratne G, Schmuck M, Le Gallo M, Cherubini G, Benini L, Sebastian A, Rahimi A (2021) Robust high-dimensional memory-augmented neural networks. Nat Commun 12(1):2468
Article Google Scholar
Lesort, T, George T, Rish I (2021) Continual learning in deep networks: an analysis of the last layer. ar**v preprint ar**v:2106.01834
Wang Y, Chao W-L, Weinberger KQ, Maaten L (2019) Simpleshot: revisiting nearest-neighbor classification for few-shot learning. ar**v preprint ar**v:1911.04623
Hou S, Pan X, Loy CC, Wang Z, Lin D (2019) Learning a unified classifier incrementally via rebalancing. In: IEEE conference on computer vision and pattern recognition, pp 831–839

Download references

Author information

Authors and Affiliations

Jilin University, No. 2699 Qian** Street, Changchun, 130012, China
Shuangmei Wang, Yang Cao & Tieru Wu
Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, Changchun, China
Tieru Wu

Authors

Shuangmei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Tieru Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SW—Conceptualization, Methodology, Software, Original draft. YC—Review & Editing, Supervision. TW–Review & Editing, Supervision.

Corresponding author

Correspondence to Tieru Wu.

Ethics declarations

Conflict of interest

The authors declare to have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, S., Cao, Y. & Wu, T. TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning. Neural Process Lett 56, 167 (2024). https://doi.org/10.1007/s11063-024-11605-0

Download citation

Accepted: 18 March 2024
Published: 08 May 2024
DOI: https://doi.org/10.1007/s11063-024-11605-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning

Abstract

Similar content being viewed by others

Knowledge Representation by Generic Models for Few-Shot Class-Incremental Learning

Flexible few-shot class-incremental learning with prototype container

Long-Tailed Class Incremental Learning

1 Introduction

2 Related Work

2.1 Few-Shot Learning

2.3 Few-Shot Class-Incremental Learning

3 Method

3.1 Problem Statement

3.2 TLCE

3.2.1 Robust Hyperdimensinal Network (RHD)

3.2.2 Transferable Knowledge Network (TKN)

3.3 Incremental Test

4 Experiments

4.1 Datasets

4.2 Implementation Details

4.3 Comparison and Evaluation

4.4 Ablation Study

5 Conclusion

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TLCE: Transfer-Learning Based Classifier Ensembles for Few-Shot Class-Incremental Learning

Abstract

Similar content being viewed by others

Knowledge Representation by Generic Models for Few-Shot Class-Incremental Learning

Flexible few-shot class-incremental learning with prototype container

Long-Tailed Class Incremental Learning

1 Introduction

2 Related Work

2.1 Few-Shot Learning

2.3 Few-Shot Class-Incremental Learning

3 Method

3.1 Problem Statement

3.2 TLCE

3.2.1 Robust Hyperdimensinal Network (RHD)

3.2.2 Transferable Knowledge Network (TKN)

3.3 Incremental Test

4 Experiments

4.1 Datasets

4.2 Implementation Details

4.3 Comparison and Evaluation

4.4 Ablation Study

5 Conclusion

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation