A Stop** Criterion for Transductive Active Learning

Kottke, Daniel; Sandrock, Christoph; Krempl, Georg; Sick, Bernhard

doi:10.1007/978-3-031-26412-2_29

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1527 Accesses
1 Citations

Abstract

In transductive active learning, the goal is to determine the correct labels for an unlabeled, known dataset. Therefore, we can either ask an oracle to provide the right label at some cost or use the prediction of a classifier which we train on the labels acquired so far. In contrast, the commonly used (inductive) active learning aims to select instances for labeling out of the unlabeled set to create a generalized classifier, which will be deployed on unknown data. This article formally defines the transductive setting and shows that it requires new solutions. Additionally, we formalize the theoretically cost-optimal stop** point for the transductive scenario. Building upon the probabilistic active learning framework, we propose a new transductive selection strategy that includes a stop** criterion and show its superiority.

You have full access to this open access chapter, Download conference paper PDF

Active learning algorithm through the lens of rejection arguments

Article Open access 26 December 2023

Toward optimal probabilistic active learning using a Bayesian approach

Article Open access 04 May 2021

Active Learning Algorithms for Multi-label Data

Keywords

1 Introduction

In classification, the goal is to create a classifier that predicts the true labels for unlabeled instances. Therefore, the classifier needs a set of instance-label pairs (i. e., the training set) which is often not directly available. Fortunately, unlabeled data is usually available at a low cost. However, labeling data is often expensive. Thus, active learning may reduce the annotation cost by selecting instances for labeling that help the classifier in its training progress the most [24].

In this article, we propose to distinguish inductive and transductive active learning. To visualize the difference between both scenarios, we give the following examples: (1) We aim to train a general model to identify protected animals on high-resolution satellite images to surveil their population. In this inductive learning example, we aim to build a general classifier as we want to use it periodically and not only on the images of the initial set (i. e., the test data is unknown). (2) After a natural disaster destroyed some buildings, we search for survivors. Hence, we take satellite images to find collapsed buildings across the affected regions. In that transductive context, it is important to classify the collected images correctly as their evaluation decides between life and death. In such a transductive scenario, the performance on the collected data is important. Hence, it might be beneficial to use the classifier mainly for simple cases and annotate difficult cases manually even if they do not improve the classifier’s performance much. Mixed inductive-transductive scenarios are also possible, where the generalization of the performance beyond the collected data might be relevant. However, to highlight the characteristics and consequences of each scenario, and due to space limitations, this paper will focus on disjoint scenarios.

Up until now, almost all literature refers to inductive active learning and only a few works exist that mention the transductive scenario. Tong [26, p. 15] even argued that the transductive scenario is a special case of inductive active learning and, therefore, solving the inductive case is sufficient. Recently, some articles [16, 23] consider transductive active learning but they did not mention its distinct difference to the standard inductive setting in detail.

When deploying classifiers that have been trained with active learning, it is crucial to decide when to stop acquiring more labels [11]. Therefore, cost-sensitive stop** criteria balance misclassification and annotation costs [6,

1.

We formally define and describe transductive active learning and show that it is beneficial to develop transductive selection strategies (Hypothesis A).

2.

We propose a new transductive selection strategy and show its superiority (Hypothesis B). Therefore, we additionally introduce the minimum aggregated cost score, which is a new transductive, cost-based evaluation measure that considers annotation and misclassification costs.

3.

We propose a new cost-based stop** criterion for transductive active learning which outperforms its competitors (Hypothesis C).

Next, we discuss the related work, followed by the problem definition, the probabilistic active learning framework, the extension to the transductive case, and our new stop** criterion. Our evaluation is based on three hypotheses.

2 Background and Related Work

In the early 1970 s, Vapnik introduced the concept of transductive inference, which he discussed in more detail in his later publications, e. g. [29, pp. 339ff.]. Both concepts mainly differ in the availability of an evaluation set. In inductive inference, the evaluation set is unknown, whereas it is known for transductive inference. The concept of transduction became especially relevant in the area of semi-supervised learning [4, pp. 453ff.]. Here, labels are only partially available, and the assumption is that incorporating the unlabeled instances can improve the classifier’s performance. One approach is to successively label the most certain unlabeled instances based on the current classification results. Thereby, the approaches incorporate the structure of the data to build more realistic classification hypotheses [25, 27]. In this paper, we extend this idea to active learning.

The main idea of active learning is to actively ask for information that helps best to improve the classifier’s predictions [24]. In general, the active learning cycle starts with an initially unlabeled set of instances. A selection strategy successively selects some of these instances and then, an oracle provides the corresponding class labels for these instances. After updating the classifier, the cycle restarts. The main focus of active learning research is on finding an appropriate selection strategy. The most commonly used is uncertainty sampling [14], which selects instances where the classifier is most uncertain. These uncertainty scores are mainly based on probabilistic predictions. Query-by-committee [15] builds a classifier ensemble and selects instances where its members disagree the most. Expected error reduction [22] optimizes the generalization error by simulating potential label acquisitions and thereby provides a decision-theoretic score. Chapelle [3] observed that the used probabilities can be unreliable for only a few labels. Hence, he introduced a prior on the classes for regularization. Value of information [9] differs from expected error reduction in the way that it evaluates the generalization error only on the unlabeled instances and assumes that an unlabeled instance is correct after labeling. In probabilistic active learning [12], the generalization error for both, the current and the simulated (with the additional label) classifier, is evaluated on the same probability distribution.

The term transduction also appears in different contexts in active learning literature. Varying from our definition of transductive active learning, the authors of [7, 20] use the term transduction as a technique of propagating labels to the remaining unlabeled data by using the predictions of the classifier. This self-labeling approach is used to create a more robust classifier as it is known from semi-supervised learning. Yu et al. [31] propose a transductive experimental design. Instead of using discrete classes as in classification tasks, they train a model for noisy, continuous targets. Balasubramanian et al. [1] present a selection strategy in the online-based setting. New instances are labeled if the current estimated performance of the classifier is insufficient. As they know this new instance when evaluating it, they use the term transductive learning.

Ishibashi and Hino [13]) evaluate the predictive error of the classifier on unlabeled data or already queried data. (2) Confidence-based approaches (e. g., [30]) use the uncertainty of the model on the remaining unlabeled data to determine the stop** point. (3) Stability-based approaches (e. g., [6] introduce a cost-sensitive scenario with a parameter balancing the annotation and misclassification cost. They propose two stop** criteria that compare the expected performance gain and the annotation cost caused by querying an instance. The first one uses convergence properties to estimate the performance gain, while the second one builds on a probabilistic classifier serving this purpose. This idea uses the generalization error of expected error reduction [22] which has been extended in [9, 10]. The stop** criterion proposed in [6, 19], may not solve the problem as it is hard to be estimated. In transduction, the evaluation set is given, which allows us to define a more intuitive and general cost function. To our knowledge, the transductive setting has not been investigated in a cost-sensitive scenario.

3 Problem Definition

For this section, we use a slightly adapted version of Vapnik’s [29, p. 15] definition of “learning from examples”. A learning task consists of: (1) a generator of random vectors (the instances) $\boldsymbol{x}\in \mathbb {R}^D$, drawn independently from a fixed but unknown probability distribution function $p(\boldsymbol{x})$, (2) an oracle that returns an output value (the label) $y \in \mathcal {Y} $, according to a conditional distribution function $p(y|\boldsymbol{x})$, also fixed but unknown, and (3) a classifier f that aims to predict the oracle’s outputs.

In pool-based active learning, we have a dataset $\mathcal {D}= \{(\boldsymbol{x}_1, y_1), ..., (\boldsymbol{x}_N, y_N)\}$, where all instances $\boldsymbol{x}_i$ but only a few/no labels $y_i$ are known to the learner, and $\mathcal {D}\mathop {\mathrm {\overset{\text {i.i.d.}}{\sim }}}\limits p(\boldsymbol{x},y) = p(y|\boldsymbol{x}) \cdot p(\boldsymbol{x})$. Specifically, the learner has access to^{Footnote 1}:

1.
A small or empty set of initially labeled instances $\mathcal {L}_0 \subseteq \mathcal {D}$.
2.
A set of initially unlabeled instances $ \mathcal {U} _0 = \{\boldsymbol{x}: (\boldsymbol{x},y) \in \mathcal {D}\setminus \mathcal {L}_0 \}$.
3.
An oracle o that returns the label $y = o(\boldsymbol{x})$ for every $(\boldsymbol{x}, y) \in \mathcal {D}$.

In each iteration $i \ge 1$, a selection strategy selects one instance from the candidate pool $\tilde{\boldsymbol{x}}\in \mathcal {U} _{i-1}$ with the goal to improve the performance of the classifier. The selected instance $\tilde{\boldsymbol{x}}$ is labeled by the oracle with $\tilde{y} = o(\tilde{\boldsymbol{x}})$, added to the set of labeled instances and removed from the candidate pool.

$$\begin{aligned} \mathcal {L}_{i}&= \mathcal {L}_{i-1} \cup \{(\tilde{\boldsymbol{x}}, \tilde{y})\} \end{aligned}$$

(1)

$$\begin{aligned} \mathcal {U} _{i}&= \mathcal {U} _{i-1} \setminus \{\tilde{\boldsymbol{x}}\} \end{aligned}$$

(2)

After each iteration, the classifier is updated on the current labeled set which we denote by $f^{\mathcal {L}_i}$. Note that $ \mathcal {U} _i$ only contains instances, whereas $\mathcal {D}$ and $\mathcal {L}_i$ consist of instance-label pairs. For readability purposes, we write $ \mathcal {U} $ and $\mathcal {L}$ without the indices if possible.

In transductive active learning, the goal is to determine the correct labels for all instances in $\mathcal {D}$. As we assume that the oracle provided the true labels for instances in $\mathcal {L}$, we only need the classifier to predict the labels for instances in $ \mathcal {U} $. To simplify the notation, we define a meta-classifier $g_{f}^{\mathcal {L}}$ that returns the known labels for instances in the labeled set and uses the classifier $f^{\mathcal {L}}$ to predict the unknown labels. This is necessary as we cannot be sure that $f^{\mathcal {L}}(\boldsymbol{x}) = y$ for all $(\boldsymbol{x}, y) \in \mathcal {L}$.

$$\begin{aligned} g_{f}^{\mathcal {L}}(\boldsymbol{x}) = {\left\{ \begin{array}{ll} y &{} \text {if } (\boldsymbol{x},y) \in \mathcal {L}\\ f^{\mathcal {L}}(\boldsymbol{x}) &{} \text {else} \end{array}\right. } \end{aligned}$$

(3)

We define the transductive risk as the sum of classification losses L over $\mathcal {D}$. As stated above, it is sufficient to evaluate over $ \mathcal {U} $.

$$\begin{aligned} R^\textrm{tr}_\mathcal {D}(f^\mathcal {L}) = \sum _{(\boldsymbol{x},y) \in \mathcal {D}} L(y, g_f^\mathcal {L}(\boldsymbol{x})) = \sum _{\boldsymbol{x}\in \mathcal {U} } L(o(\boldsymbol{x}), f^\mathcal {L}(\boldsymbol{x})) = R^\textrm{tr}_ \mathcal {U} (f^\mathcal {L}) \end{aligned}$$

(4)

Throughout this article, we use the zero-one loss that compares the true label y with the prediction $f^\mathcal {L}(\boldsymbol{x})$ label and returns 0 if the prediction is correct and 1 otherwise.

$$\begin{aligned} L(y, f^\mathcal {L}(\boldsymbol{x})) = {\left\{ \begin{array}{ll} 0 &{} y = f^\mathcal {L}(\boldsymbol{x}) \\ 1 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(5)

In inductive active learning, we aim to train a classifier for every (possibly unknown) instance $\boldsymbol{x}\mathop {\mathrm {\overset{\text {i.i.d.}}{\sim }}}\limits p(\boldsymbol{x})$ with the goal of generalization. Consequently, we do not know the evaluation instances during training in the inductive setting. The distribution $p(\boldsymbol{x},y)$ is usually approximated with a labeled validation set. As in [29], the (inductive) risk is defined as follows.

$$\begin{aligned} R(f^\mathcal {L}) = \mathop {\mathrm {\mathbb {E}}}\limits _{p(\boldsymbol{x}, y)} \left[ L(y, f^\mathcal {L}(\boldsymbol{x}))\right] = \mathop {\mathrm {\mathbb {E}}}\limits _{p(\boldsymbol{x})} \left[ \mathop {\mathrm {\mathbb {E}}}\limits _{p(y|\boldsymbol{x})} \left[ L(y, f^\mathcal {L}(\boldsymbol{x}))\right] \right] \end{aligned}$$

(6)

The transductive active learning setting differs from the inductive one in two ways: (1) One knows the data used to evaluate the model beforehand, and one does not need to build a generalized model. (2) One can exclude data from being predicted by the classifier by asking for the label from the oracle.

4 From Inductive to Transductive Active Learning

We build our selection strategy for transductive active learning upon the probabilistic active learning framework [12] that estimates the expected risk reduction when a candidate instance is selected for label acquisition. In the first subsection, we summarize the existing method for the inductive scenario and derive the equations for the transductive case in the second subsection.

4.1 The Probabilistic Active Learning Framework

To estimate the inductive risk, we need to estimate the unknown distributions $p(\boldsymbol{x})$ and $p(y|\boldsymbol{x})$ in Eq. 6. As suggested by [12, 21], we approximate $p(\boldsymbol{x})$ using a Monte Carlo approach with an unlabeled set $ \mathcal {E} \mathop {\mathrm {\overset{\text {i.i.d.}}{\sim }}}\limits p(\boldsymbol{x})$. Here, we use $ \mathcal {E} = \{\boldsymbol{x}:(\boldsymbol{x},y) \in \mathcal {L}\} \cup \mathcal {U} $. We estimate $p(y|\boldsymbol{x})$ with $p([)\mathcal {L}]{y}{\boldsymbol{x}}$ using the data in $\mathcal {L}$ [3, 12, 17]. The probability is based on a kernel frequency estimate $\boldsymbol{k}^{\mathcal {L}}_{\boldsymbol{x}}$ that contains the number of samples for every class near $\boldsymbol{x}$ using the similarity/kernel $K(\cdot ,\cdot )$. By using a Bayesian approach that introduces a prior $\boldsymbol{\epsilon }\in \mathbb {R}_{+}^{| \mathcal {Y} |}$, the probability $\mathbbm {p}^{\mathcal {L}}(y|\boldsymbol{x})$ is given by the y-th element of the normalized vector $\boldsymbol{k}^{\mathcal {L}}_{\boldsymbol{x}} + \boldsymbol{\epsilon }$.

$$\begin{aligned} \mathbbm {p}^{\mathcal {L}}(y|\boldsymbol{x}) = \frac{(\boldsymbol{k}^{\mathcal {L}}_{\boldsymbol{x}} + \boldsymbol{\epsilon })_{y}}{|| \boldsymbol{k}^{\mathcal {L}}_{\boldsymbol{x}} + \boldsymbol{\epsilon } ||_1} \qquad \qquad k_{\boldsymbol{x},y}^{\mathcal {L}} = \sum _{\begin{array}{c} (\boldsymbol{x}', y') \in \mathcal {L}\\ y'=y \end{array}} K(\boldsymbol{x}, \boldsymbol{x}') \end{aligned}$$

(7)

The inductive risk of a classifier is estimated as follows.

$$\begin{aligned} \hat{R}_{ \mathcal {E} , p^{\mathcal {L}}}(f^\mathcal {L}) = \frac{1}{| \mathcal {E} |} \sum _{\boldsymbol{x}\in \mathcal {E} } \sum _{y \in \mathcal {Y} } \mathbbm {p}^{\mathcal {L}}(y|\boldsymbol{x}) L(y, f^\mathcal {L}(\boldsymbol{x})) \approx R(f^\mathcal {L}) \end{aligned}$$

(8)

For a given candidate $\tilde{\boldsymbol{x}}\in \mathcal {U} $, we calculate the probabilistic gain (${{\,\mathrm{{\text {xgain}}}\,}}$) as the expectation value over all possible labeling outcomes $\tilde{y}\in \mathcal {Y} $ of the estimated inductive risk reduction. Therefore, we compare the inductive risks (estimated on $ \mathcal {E} $ and $p^{{\mathcal {L}}^{+}}$) of the current classifier $f^\mathcal {L}$ and the simulated classifier $f^{\mathcal {L}^+}$ that includes the candidate with $\mathcal {L}^+ = \mathcal {L}\cup (\tilde{\boldsymbol{x}}, \tilde{y})$. Since we want to maximize the gain, we consider the negative risk reduction.

$$\begin{aligned}&{{\,\mathrm{{\text {xgain}}}\,}}(\tilde{\boldsymbol{x}},\mathcal {L}, \mathcal {E} ) = -\mathop {\mathrm {\mathbb {E}}}\limits _{p^{\mathcal {L}}(\tilde{y}|\tilde{\boldsymbol{x}})} \left[ \hat{R}_{ \mathcal {E} , p^{\mathcal {L}^+}}(f^{\mathcal {L}^+}) - \hat{R}_{ \mathcal {E} , p^{\mathcal {L}^+}}(f^\mathcal {L}) \right] \end{aligned}$$

(9)

$$\begin{aligned}&= - \sum _{\tilde{y}\in \mathcal {Y} } \mathbbm {p}^{\mathcal {L}}(\tilde{y}|\tilde{\boldsymbol{x}}) \left[ \frac{1}{| \mathcal {E} |} \sum _{\boldsymbol{x}\in \mathcal {E} } \sum _{y \in \mathcal {Y} } \mathbbm {p}^{\mathcal {L}^+}({y}|{\boldsymbol{x}}) \Big ( L\big (y, f^{\mathcal {L}^+}(\boldsymbol{x})\big ) - L\big (y, f^{\mathcal {L}}(\boldsymbol{x})\big ) \Big ) \right] \end{aligned}$$

(10)

$$\begin{aligned}&= -\sum _{\tilde{y}\in \mathcal {Y} } \frac{(\boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta })_{\tilde{y}}}{|| \boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta } ||_1} \cdot \frac{1}{| \mathcal {E} |} \sum _{\boldsymbol{x}\in \mathcal {E} } \sum _{y \in \mathcal {Y}} \frac{(\boldsymbol{k}_{\boldsymbol{k}_{\boldsymbol{x}}}^{\mathcal {L}^+} + \boldsymbol{\alpha })_{y}}{|| \boldsymbol{k}_{\boldsymbol{k}_{\boldsymbol{x}}}^{\mathcal {L}^+} + \boldsymbol{\alpha } ||_1} \left( L(y, f^{\mathcal {L}^+}(\boldsymbol{x})) - L(y, f^{\mathcal {L}}(\boldsymbol{x}))\right) \end{aligned}$$

(11)

The vectors $\boldsymbol{\alpha }$ and $\boldsymbol{\beta }$ are the priors of the label distribution of the evaluation sample $\boldsymbol{x}$ and the candidate $\tilde{\boldsymbol{x}}$, respectively. They can be interpreted as the number of pseudo-labels added to each region of the dataset. High numbers lead to high regularization of the probabilities and vice versa. As proposed in [12], we set $\boldsymbol{\alpha }= \boldsymbol{\beta }= (10^{-3}, \dots , 10^{-3})$.

The selection strategy chooses the candidate instance $\tilde{\boldsymbol{x}}^*$ that maximizes the probabilistic gain.

$$\begin{aligned} \tilde{\boldsymbol{x}}^* = \mathop {\mathrm {arg\,max}}\limits _{\tilde{\boldsymbol{x}}\in \, \mathcal {U} } \{{{\,\mathrm{{\text {xgain}}}\,}}(\tilde{\boldsymbol{x}},\mathcal {L}, \mathcal {E} )\} \end{aligned}$$

(12)

4.2 Transductive Probabilistic Active Learning

The goal of transductive active learning is to determine the correct label for all instances in the dataset $\mathcal {D}$. As we assume that the oracle is omniscient, we know that the labels in $\mathcal {L}$ are already correct. To get the label of the remaining instances in $ \mathcal {U} $, we can either ask the oracle (and be certain that it is correct) or use the classifier’s predictions $f^\mathcal {L}(\boldsymbol{x})$. In the latter case, we run into the risk of making mistakes.

Due to these specific characteristics of the transductive scenario, we need to adapt the estimate in Eq. 7 such that the probability for the correct label y for labeled instances $\boldsymbol{x}$ with $(\boldsymbol{x},y) \in \mathcal {L}$ is 1.

$$\begin{aligned} \mathbbm {p}^{\mathcal {L}}_{\textrm{tr}}(y|\boldsymbol{x})&= {\left\{ \begin{array}{ll} 1 &{} (\boldsymbol{x}, y) \in \mathcal {L}\\ 0 &{} (\boldsymbol{x}, y') \in \mathcal {L}\wedge y \not = y' \\ \mathbbm {p}^{\mathcal {L}}(y|\boldsymbol{x}) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(13)

To calculate the probabilistic gain in the transductive setting, we use the same estimation idea as before, but with the transductive risk. The first step follows the simplification in Eq. 4.

$$\begin{aligned} \hat{R}^\textrm{tr}_{\mathcal {D}, p^{\mathcal {L}}_{\textrm{tr}}}(f^{\mathcal {L}}) = \hat{R}^\textrm{tr}_{ \mathcal {U} , p^{\mathcal {L}}_{\textrm{tr}}}(f^{\mathcal {L}})&= \sum _{\boldsymbol{x}\in \mathcal {U} } \sum _{y \in \mathcal {Y} } \mathbbm {p}^{\mathcal {L}}_{\textrm{tr}}(y|\boldsymbol{x}) \cdot L(y, g_f^{\mathcal {L}}(\boldsymbol{x})) \approx R^\textrm{tr}_{ \mathcal {U} }(f^{\mathcal {L}}) \end{aligned}$$

(14)

This estimate allows us to define the estimated risk reduction in the transductive setting as follows:

$$\begin{aligned}&\varDelta \hat{R}^\textrm{tr}_{\mathcal {D}, p^{\mathcal {L}^+}_{\textrm{tr}}}(f^{\mathcal {L}^+}, f^{\mathcal {L}}) = \hat{R}^\textrm{tr}_{ \mathcal {U} , p^{\mathcal {L}^+}_{\textrm{tr}}}(f^{\mathcal {L}^+}) - \hat{R}^\textrm{tr}_{ \mathcal {U} , p^{\mathcal {L}^+}_{\textrm{tr}}}(f^{\mathcal {L}}) \end{aligned}$$

(15)

$$\begin{aligned}&= \sum _{\boldsymbol{x}\in \mathcal {U} } \sum _{y \in \mathcal {Y} } \mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(y|\boldsymbol{x}) \left( {L(y, g_f^{\mathcal {L}^+}(\boldsymbol{x})) - L(y, g_f^{\mathcal {L}}(\boldsymbol{x}))}\right) \end{aligned}$$

(16)

$$\begin{aligned}&= \sum _{\boldsymbol{x}\in \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}} \sum _{y \in \mathcal {Y} } \mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(y|\boldsymbol{x}) \left( L\big (y, f^{\mathcal {L}^+}(\boldsymbol{x})\big ) - L\big (y, f^{\mathcal {L}}(\boldsymbol{x})\big )\right) \nonumber \\&\quad - \sum _{y \in \mathcal {Y} } \mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(y|\tilde{\boldsymbol{x}}) \left( L\big (y, \tilde{y}\big ) - L\big (y, f^{\mathcal {L}}(\tilde{\boldsymbol{x}})\big )\right) \end{aligned}$$

(17)

$$\begin{aligned}&= \sum _{\boldsymbol{x}\in \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}} \sum _{y \in \mathcal {Y} } \mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(y|\boldsymbol{x}) \left( L\big (y, f^{\mathcal {L}^+}(\boldsymbol{x})\big ) - L\big (y, f^{\mathcal {L}}(\boldsymbol{x})\big )\right) - L\big (\tilde{y}, f^{\mathcal {L}}(\tilde{\boldsymbol{x}})\big ) \;. \end{aligned}$$

(18)

In Eq. 17, we separate $\tilde{\boldsymbol{x}}$ from $ \mathcal {U} $ as the candidate serves two purposes. In the first part of the equation, we estimate the inductive risk reduction for the remaining unlabeled instances resulting from the improvement of the model with the additional label. In the second part, we assume that the label $\tilde{y}$ is correct. Therefore, we only need to consider the case $y = \tilde{y}$ as $\mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(\tilde{y}|\tilde{\boldsymbol{x}}) = 1$ and $\mathbbm {P}_{\textrm{tr}}^{\mathcal {L}^+}(y|\tilde{\boldsymbol{x}}) = 0$ for $y \not = \tilde{y}$. Hence, we simplify that term to $L(\tilde{y}, f^{\mathcal {L}}(\tilde{\boldsymbol{x}}))$.

Analogous to Eq. 9, the transductive probabilistic gain is calculated as follows:

$$\begin{aligned}&{{\,\mathrm{{\text {xgain}}}\,}}^\textrm{tr}(\tilde{\boldsymbol{x}},\mathcal {L}, \mathcal {D}) = -\mathop {\mathrm {\mathbb {E}}}\limits _{p^{\mathcal {L}}_{\textrm{tr}}(\tilde{y}|\tilde{\boldsymbol{x}})} \left[ \varDelta \hat{R}^\textrm{tr}_{\mathcal {D}, p([)\mathcal {L}^+]}(f^{\mathcal {L}^+}, f^{\mathcal {L}})\right] \end{aligned}$$

(19)

$$\begin{aligned} =&-\sum _{\tilde{y}\in \mathcal {Y} } \frac{(\boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta })_{\tilde{y}}}{|| \boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta } ||_1} \cdot \sum _{\boldsymbol{x}\in \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}} \sum _{y \in \mathcal {Y} } \frac{(\boldsymbol{k}_{\boldsymbol{k}_{\boldsymbol{x}}}^{\mathcal {L}^+} + \boldsymbol{\alpha })_{y}}{|| \boldsymbol{k}_{\boldsymbol{k}_{\boldsymbol{x}}}^{\mathcal {L}^+} + \boldsymbol{\alpha } ||_1} \left( L(y, f^{\mathcal {L}^+}(\boldsymbol{x})) - L(y, f^{\mathcal {L}}(\boldsymbol{x}))\right) \nonumber \\&+ \sum _{\tilde{y}\in \mathcal {Y} } \frac{(\boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta })_{\tilde{y}}}{|| \boldsymbol{k}_{\tilde{\boldsymbol{x}}}^{\mathcal {L}} + \boldsymbol{\beta } ||_1} \cdot L\big (\tilde{y}, f^{\mathcal {L}}(\tilde{\boldsymbol{x}})\big ) \end{aligned}$$

(20)

The first part is equal to the inductive probabilistic gain evaluated on $ \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}$ multiplied by the number of instances in that set. This factor is necessary as the transductive risk is defined as the sum over all losses whereas the inductive risk uses the average loss. We call the second part of the equation the candidate gain (${{\,\mathrm{{\text {cgain}}}\,}}$) as it results from acquiring the correct label from the candidate instance. In summary, we can write the transductive probabilistic gain as the sum of the inductive and the candidate gain:

$$\begin{aligned} {{\,\mathrm{{\text {xgain}}}\,}}^\textrm{tr}(\tilde{\boldsymbol{x}},\mathcal {L}, \mathcal {U} ) = | \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}| \cdot {{\,\mathrm{{\text {xgain}}}\,}}(\tilde{\boldsymbol{x}},\mathcal {L}, \mathcal {U} \setminus \{\tilde{\boldsymbol{x}}\}) + {{\,\mathrm{{\text {cgain}}}\,}}(\tilde{\boldsymbol{x}},\mathcal {L}, \{\tilde{\boldsymbol{x}}\}) \;.\end{aligned}$$

(21)

4.3 Illustrative Example

Figure 1 shows the inductive and the candidate gain for a synthetic 2-dimensional dataset with two classes. The 7 already labeled instances are marked with a gray circle. The classifier’s decision boundary is given as a black line and the dashed lines mark its confidence. The utilities are calculated for every unlabeled instance and are given as green surfaces (the color refers to the utility of the nearest instance). We see that the candidate gain (right plot) focuses on difficult instances in regions of high Bayesian error (near the decision boundary). Hence, it does not explore the data space but aims to ask the oracle to prevent the classifier from making wrong predictions. In contrast, the inductive gain (left plot) aims at improving the performance of the classifier. Therefore, it explores regions that are not yet covered with labels (upper left and lower right) and exploits the labels that already are available by refining the decision boundary. Moreover, we observe that regions of higher density (lower right) are preferred over regions with lower density (upper left) as labels have more impact on the classifier’s performance there.

5 A Transductive Stop** Criterion

To define a stop** criterion for transductive active learning, we introduce a performance metric using an economic rationale. Therefore, we consider the most relevant kinds of costs involved in an active learning scenario: (1) The annotation cost $c_{AN}\in {\mathbb {R}^{\ge {0}}}$ describes the cost of acquiring one label from an oracle, and (2) the misclassification cost $c_{ER}\in {\mathbb {R}^{\ge {0}}}$ describes the cost induced by one wrong prediction of the classifier. Intuitively, the annotation cost is dependent on the number of acquired labels, whereas the misclassification cost usually decreases as more labels become available.

We define the aggregated cost as the sum of annotation and misclassification costs. Consequently, the aggregated cost can be written as follows for the i-th iteration of the active learning cycle.

$$\begin{aligned} \textrm{aggcost}(f, \mathcal {L}_i, \mathcal {U} _i, c_{AN}, c_{ER}) = \underbrace{|\mathcal {L}_i| \cdot c_{AN}}_{\begin{array}{c} \text {Annotation} \\ \text {Cost} \end{array}} + \; \underbrace{R^{\textrm{tr}}_{ \mathcal {U} _i}(f^{\mathcal {L}_i})\cdot c_{ER}}_{\begin{array}{c} \text {Misclassification} \\ \text {Cost} \end{array}} \end{aligned}$$

(22)

Hence, we assume that the annotation cost is a linear function considering fixed costs $c_{AN}$ for annotating a single instance. We can easily generalize this by using some arbitrary cost function, which describes the cost of acquiring the labeled set $\mathcal {L}_i$, but this is not in the scope of this article. We determine the misclassification cost using the product of the estimated number of wrongly classified instances $R^{\textrm{tr}}_{ \mathcal {U} _i}(f^{\mathcal {L}_i})$ and the cost for one error $c_{ER}$.

The optimal solution from an economic perspective is to achieve the minimum aggregated cost ($\textrm{mac}$), as shown in Eq. 23. Calculating the mac is equivalent to finding the optimal stop** point for the given costs.

$$\begin{aligned} \textrm{mac}(f, c_{AN},c_{ER})= \min _{i}\big (\textrm{aggcost}(f, \mathcal {L}_i, \mathcal {U} _i, c_{AN}, c_{ER})\big ) \end{aligned}$$

(23)

In this article, we assume to have a selection strategy that iteratively selects one sample. In each iteration of the active learning cycle, we have to decide whether to acquire the label of another instance or to stop querying new labels. Consequently, we stop the acquisition as soon as the annotation cost $c_{AN}$ exceeds the estimated cost reduction, based on the transductive probabilistic gain:

$$\begin{aligned} \text {Stop when } \varDelta c_{ER}< c_{AN}\quad \text { with } \quad \varDelta c_{ER}= {{\,\mathrm{{\text {xgain}}}\,}}^\textrm{tr}(\tilde{\boldsymbol{x}}^* ,\mathcal {L}_i, \mathcal {U} _i) \cdot c_{ER}\,. \end{aligned}$$

(24)

6 Experimental Evaluation

This section presents our experimental evaluation and starts by describing the experimental setup including the used datasets, competitors, and visualizations. Our evaluation approach is based on three hypotheses as motivated in the introduction. For each contribution, we formulate one hypothesis, present the key findings, and provide a detailed discussion with plots and/or tables.

6.1 Setup, Datasets, and Competitors

All experiments have been implemented in Python using scikit-learn and scikit-activeml^{Footnote 2}. We conduct experiments with the following selection strategies: random sampling (rand), least confidence uncertainty sampling (lc) [14], epistemic uncertainty sampling (epis) [17], query by committee (qbc) [15] with the Kull-back-Leibler divergence as a disagreement measure and bootstrap** to generate a committee of 10 classifiers, Monte Carlo expected error reduction (mc) [21] including the extension of Chapelle with $\epsilon =10^{-3}$ (chap) [3], and value of information (voi) [9]. To show the benefits of the new transductive probabilistic active learning (xpal_tr), we also compare it to the inductive (standard) variant (xpal) [12]. The expected error based strategies mc, chap (with [6]), voi, and xpal_tr implement a cost-based stop** criterion. Whereas voi already evaluates only on the unlabeled instances, we use the unlabeled set as the evaluation set for mc and chap to ensure comparability in the transductive setting.

We use a Parzen window classifier [18] with an RBF kernel as the classifier (similar to [3, 12, 17]). The main advantages of this classifier are the low number of parameters, the deterministic character, its probabilistic nature, and the fact that it is generic in a way that all methods can be used with that classifier. Using the same classifier for comparison is important as doing otherwise could induce additional biases. The bandwidth parameter of the kernel is set by the mean criterion [5].

We use 10 datasets from OpenML [28]. For simplicity, we remove all samples that contain missing values and standardize all features independently to zero mean and a standard deviation of one. We repeatedly (25 times) split all datasets randomly into two subsets. The first one, which contains $67 \%$ of the samples, is used for the active learning circle and builds the initially unlabeled set $\mathcal {U}_0$ according to Sect. 3. This set is used for evaluating the transductive setting. The remaining samples ($33 \%$) build the test set for the inductive setting.

6.2 Visualization Techniques

To visualize the results, we provide learning curves (e. g., Fig. 2) showing the transductive (resp. inductive) risk. For each dataset and selection strategy, we averaged the risks after every iteration over the 25 repetitions. The goal is to achieve a low error fast.

We summarize these results in ranking tables (e. g., Fig. 3). There, we show the rank of each strategy for every dataset with respect to the area under the performance curve. We calculate the rank for each of the 25 repetitions independently and average these ranks into the final score. Depending on the evaluation goal, we define a baseline strategy that will be compared to all other competitors using a paired Wilcoxon signed-rank test. We identify if the evaluation score of the competitor is significantly higher (arrow up), significantly lower (arrow down), or not significantly different (no sign) than the baseline strategy (p-value .05). These are summarizes as win/tie/loss statistics.

Moreover, we evaluate the transductive scenario by plotting the aggregated cost (e. g., Fig. 4). There, we evaluate the aggregated cost (i. e., the sum of annotation and misclassification costs) for different cost ratios. Depending on the application this ratio might differ and the practitioner can find a suitable algorithm. In Fig. 4, we show the minimum aggregated cost as we identify the optimal stop** point for every selection strategy. Hence, we can assess the quality of selection strategies without the bias of a stop** criterion. In Fig. 5 and Fig. 6 (dashed lines), the aggregated cost is determined based on the proposed stop** point of a stop** criterion. The black lines in the aggregated cost plots show the naive baselines which are determined by the minimum cost between classifying all instances as one class without acquiring any label and acquiring all labels.

Due to the large variety of plots, we only show the most interesting results. You can find all plots in the supplemental material on github.

6.3 Results

Hypothesis A: It is beneficial to develop specific selection strategies for transductive active learning.

Key Findings: When comparing inductive and transductive probabilistic active learning, we show that xpal (inductive) wins when evaluated on the inductive risk, and xpal_tr wins for the transductive risk. Hence adapting the selection strategy is beneficial and solving the inductive case (considering generalization capabilities) is not sufficient to solve transductive active learning.

Detailed Discussion: In Fig. 2, we exemplary selected three datasets to show the inductive and the transductive risk for all selection strategies. We see that the transductive risk finishes at zero risk as there are no errors when all labels are acquired. In contrast, the inductive risk converges at the Bayesian error rate. In Fig. 3, we show the ranking statistics based on the area under the inductive/transductive risk curve as described in the previous subsection. Please note that epis is only valid for 2-class problems. The results show the superiority of xpal in the inductive case (rank 2.56 vs. rank 2.95) and of xpal_tr in the transductive case (rank 1.87 vs. 2.14). The reason for that is that xpal_tr specifically incorporated the acquisition of difficult instances into the target function through the candidate gain as discussed in Subsect. 4.3.

Hypothesis B: Our selection strategy xpal_tr performs best for the transductive risk and the minimum aggregated cost.

Key Findings: We show that transductive probabilistic active learning outperforms the other competitors in the transductive scenario on average when evaluated on the transductive risk and the minimum aggregated cost, i. e., the sum of the annotation and misclassification cost for the optimal stop** point.

Detailed Discussion: To evaluate this hypothesis, we consider the figures from Hypothesis A to evaluate the transductive risk and Fig. 4 to evaluate the minimum aggregated cost. The results show: (1) For the transductive risk, xpal_tr is only defeated significantly in three cases (2 times by xpal and once by epis). Whereas epis performs mediocre on cpu (rank 5.6), the ranks of xpal_tr are all between 1.1 and 3.0. Hence, xpal_tr seems to be fairly robust. (2) For the minimum aggregated cost, we see in the ranking statistics that the hardest competitors are xpal (4 wins, 4 ties, 2 losses), epis (3 wins, 2 losses), and lc (7 wins, 3 ties). All other competitors are defeated significantly on all 10 datasets. Hereby, epis is a special case as it seems to be quite competitive. Still, it is important to note that it only works on half of the datasets as it is only applicable to 2-class problems.

Hypothesis C: Our new stop** criterion performs best compared to existing methods.

Key Findings: The selection strategy xpal_tr with the new stop** criterion outperforms the existing selection strategies that implement a stop** criterion (mc, chap, voi). To evaluate these stop** criteria independently from the selection strategy, we tested their performance together with random sampling to ensure comparability and show the superiority of our method.

Detailed Discussion: To evaluate the stop** criteria, we show the aggregated cost for the chosen stop** point with respect to the given cost ratios (left) and the ranking statistics (right): In Fig. 5, we evaluated the proposed combinations of a selection strategy and a stop** criterion. Figure 6 shows the results based on a random selection. We use random for the comparison as it induces the smallest bias on the selection. In this scenario, we cannot assume that the best candidate is always selected. Hence, we average the estimated misclassification cost reduction instead of choosing the one from the selected candidate to decide about stop**. Our method xpal_tr significantly outperforms all competitors on all datasets for both cases with only one exception (1 tie).

7 Conclusion and Outlook

In this article, we introduced and formalized the transductive active learning scenario. We showed that this scenario is not just a special case of the inductive one and that it requires new methods for instance selection. To address this problem, we proposed a novel transductive selection strategy based on the probabilistic active learning framework and experimentally showed that it performs better than the inductive version in the transductive setting. We introduced and motivated a target function for stop** criteria for transductive active learning that considers the misclassification and the annotation costs. Based on this target function, we introduced the minimum aggregated cost that evaluates stop** criteria based on how well they perform for different cost ratios. We used our strategy to derive a novel cost-based stop** criterion. The empirical evaluation showed that it outperforms existing criteria.

In the future, we aim to investigate how the prior influences the proposed methods (here set to 0.001 following [3, 12]). In this article, we only considered fixed annotation and misclassification costs and omniscient oracles. However, it is often more realistic that instances have different annotation costs (e. g., dependent on the annotation time, or quality) or that instances have different misclassification costs (e. g., dependent on the instance’s importance). Moreover, considering computational cost for the selection might be beneficial. Finally, we want to analyze how our stop** criterion can be used also with other active learning strategies such as uncertainty sampling.

Notes

1.
We assume that the instances are unique to simplify the notation. This is not a limitation as one can easily drop this assumption by addressing instance-label pairs through their index.
2.
https://github.com/dakot/stopTransAL, https://github.com/scikit-activeml.

References

Balasubramanian, V., Chakraborty, S., Panchanathan, S.: Generalized query by transduction for online active learning. In: International Conference on Computer Vision (Workshops), pp. 1378–1385 (2009)
Google Scholar
Bloodgood, M., Vijay-Shanker, K.: A method for stop** active learning based on stabilizing predictions and the need for user-adjustable stop**. ar**v preprint ar**v:1409.5165 (2014)
Chapelle, O.: Active learning for Parzen window classifier. In: International Workshop on Artificial Intelligence and Statistics, vol. 5, pp. 49–56 (2005)
Google Scholar
Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised learning. MIT Press (2010)
Google Scholar
Chaudhuri, A., Kakde, D., Sadek, C., Gonzalez, L., Kong, S.: The mean and median criteria for kernel bandwidth selection for support vector data description. In: International Conference on Data Mining (Workshops), pp. 842–849 (2017)
Google Scholar
Dimitrakakis, C., Savu-Krohn, C.: Cost-minimising strategies for data labelling: optimal stop** and active learning. In: International Symposium on Foundations of Information and Knowledge Systems, pp. 96–111 (2008)
Google Scholar
Güttler, F.N., Ienco, D., Poncelet, P., Teisseire, M.: Combining transductive and active learning to improve object-based classification of remote sensing images. Remote Sens. Lett. 7(4), 358–367 (2016)
Article Google Scholar
Ishibashi, H., Hino, H.: Stop** criterion for active learning based on error stability. ar**v preprint ar**v:2104.01836 (2021)
Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image classification. In: Conference on Computer Vision and Pattern Recognition, pp. 2372–2379 (2009)
Google Scholar
Kapoor, A., Horvitz, E., Basu, S.: Selective supervision: guiding supervised learning with decision-theoretic active learning. In: Int. Joint Conference on Artificial Intelligence, pp. 877–882 (2007)
Google Scholar
Kottke, D., Calma, A., Huseljic, D., Krempl, G., Sick, B.: Challenges of reliable, realistic and comparable active learning evaluation. In: Workshop on Interactive Adaptive Learning, pp. 2–14 (2017)
Google Scholar
Kottke, D., Herde, M., Sandrock, C., Huseljic, D., Krempl, G., Sick, B.: Toward optimal probabilistic active learning using a Bayesian approach. Mach. Learn. 110(6), 1199–1231 (2021)
Article MathSciNet MATH Google Scholar
Laws, F., Schätze, H.: Stop** criteria for active learning of named entity recognition. In: International Conference on Computational Linguistics, pp. 465–472 (2008)
Google Scholar
Lewis, D.D.: A sequential algorithm for training text classifiers. In: International ACM SIGIR Conference on Research and Development in Information Retrieval (1995)
Google Scholar
McCallumzy, A.K., Nigamy, K.: Employing EM and pool-based active learning for text classification. In: International Conference on Machine Learning, pp. 359–367 (1998)
Google Scholar
Min, F., Liu, F.L., Wen, L.Y., Zhang, Z.H.: Tri-partition cost-sensitive active learning through kNN. Soft. Comput. 23(5), 1557–1572 (2019)
Article Google Scholar
Nguyen, V.-L., Shaker, M.H., Hüllermeier, E.: How to measure uncertainty in uncertainty sampling for active learning. Mach. Learn. 111, 89–122 (2021). https://doi.org/10.1007/s10994-021-06003-9
Article MathSciNet MATH Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Pullar-Strecker, Z., Dost, K., Frank, E., Wicker, J.: Hitting the target: stop** active learning at the cost-based optimum. ar**v preprint ar**v:2110.03802 (2021)
Reitmaier, T., Calma, A., Sick, B.: Transductive active learning-a new semi-supervised learning approach based on iteratively refined generative models to capture structure in data. Inf. Sci. 293, 275–298 (2015)
Article Google Scholar
Roy, N., McCallum, A.: Toward optimal active learning through Monte Carlo estimation of error reduction. In: International Conference on Machine Learning, pp. 441–448 (2001)
Google Scholar
Roy, N., Mccallum, A., Com, M.W.: Toward optimal active learning through Monte Carlo estimation of error reduction. In: Proceedings of the International Conference on Machine Learning (ICML), p. 8. San Francisco, CA, USA (2001)
Google Scholar
Scharei, K., Herde, M., Bieshaar, M., Calma, A., Kottke, D., Sick, B.: Automated active learning with a robot. Arch. Data Science, Ser. A 5(1), 16 (2018)
Google Scholar
Settles, B.: Active learning literature survey. Technical report, University of Wisconsin, Department of Computer Science (2010)
Google Scholar
Sun, S., Hardoon, D.R.: Active learning with extremely sparse labeled examples. Neurocomputing 73(16–18), 2980–2988 (2010)
Article Google Scholar
Tong, S.: Active learning: theory and applications, Ph. D. thesis, Stanford (2001)
Google Scholar
Triguero, I., García, S., Herrera, F.: Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowl. Inf. Syst. 42(2), 245–284 (2015)
Article Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)
Article Google Scholar
Vapnik, V.N.: Statistical learning theory. John Wiley & Sons, Inc. (1998)
Google Scholar
Vlachos, A.: A stop** criterion for active learning. Comput. Speech Lang. 22(3), 295–312 (2008)
Article Google Scholar
Yu, K., Bi, J., Tresp, V.: Active learning via transductive experimental design. In: International Conference on Machine learning, pp. 1081–1088 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Kassel, Wilhelmshöher Allee 73, 34121, Kassel, Germany
Daniel Kottke, Christoph Sandrock & Bernhard Sick
Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Georg Krempl

Authors

Daniel Kottke
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Sandrock
View author publications
You can also search for this author in PubMed Google Scholar
Georg Krempl
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Sick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kottke .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1050 KB)

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kottke, D., Sandrock, C., Krempl, G., Sick, B. (2023). A Stop** Criterion for Transductive Active Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_29
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Stop** Criterion for Transductive Active Learning