HAMIATCM: high-availability membership inference attack against text classification models under little knowledge

Cheng, Yao; Luo, Senlin; Pan, Limin; Wan, Yunwei; Li, **nshuai

doi:10.1007/s10489-024-05495-x

HAMIATCM: high-availability membership inference attack against text classification models under little knowledge

Published: 19 June 2024

(2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

78 Accesses
Explore all metrics

Abstract

Membership inference attack opens up a newly emerging and rapidly growing research to steal user privacy from text classification models, a core problem of which is shadow model construction and members distribution optimization in inadequate members. The textual semantic is likely disrupted by simple text augmentation techniques, which weakens the correlation between labels and texts and reduces the precision of member classification. Shadow models trained exclusively with cross-entropy loss have little differentiation in embeddings among various classes, which deviates from the distribution of target models, then impacts the embeddings of members and reduces the F1 score. A competitive and High-Availability Membership Inference Attack against Text Classification Model (HAMIATCM) is proposed. At the data level, by selecting highly significant words and applying text augmentation techniques such as replacement or deletion, we expand knowledge of attackers, preserving vulnerable members to enhance the sensitive member distribution. At the model level, constructing contrastive loss and adaptive boundary loss to amplify the distribution differences among various classes, dynamically optimize the boundaries of members, enhancing the text representation capability of the shadow model and the classification performance of the attack classifier. Experimental results demonstrate that HAMIATCM achieves new state-of-the-art, significantly reduces the false positive rate, and strengthens the capability of fitting the output distribution of the target model with less knowledge of members.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification

Data augmentation and adversary attack on limit resources text classification

Article 12 April 2024

Data Augmentation Based on Topic Relevance to Enhance Text Classification in Scarcity of Training Data

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The Medical Text Classification is available from the Kaggle, which doesn’t contain sensitive personal information, and it only for theoretical research.

Additionally, all data are cited references and sources.

Notes

https://www.kaggle.com/code/chaitanyakck/medical-text-classification/input.

References

Hu H, Salcic Z, Sun L et al (2022) Membership inference attacks on machine learning: a survey. ACM Comput Surv (CSUR) 54(11s):1–37
Article Google Scholar
Vakili T (2023) Attacking and defending the privacy of clinical language models[D]. Department of Computer and Systems Sciences, Stockholm University
Vakili T, Dalianis H (2021) Are clinical BERT models privacy preserving? The difficulty of extracting patient-condition associations[C]. Proceedings of the AAAI 2021 Fall Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics
Zhang M, Ren Z, Wang Z et al (2021) Membership inference attacks against recommender systems[C]. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 864–879
Zhang G, Liu B, Zhu T et al (2022) Label-only membership inference attacks and defenses in semantic segmentation models. IEEE Trans Dependable Secur Comput 20(2):1435–1449
Article Google Scholar
Shejwalkar V, Inan HA, Houmansadr A et al (2021) Membership inference attacks against nlp classification models[C]. NeurIPS 2021 Workshop Privacy in Machine Learning
Wang Y, Xu N, Huang S et al (2022) Analyzing and defending against membership inference attacks in natural language processing classification[C]. 2022 IEEE International Conference on Big Data (Big Data). IEEE, 5823–5832
Song C, Raghunathan A (2020) Information leakage in embedding models[C]. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 377–390
Justus Mattern F, Mireshghallah Z, ** et al (2023) Membership inference attacks against language models via neighbourhood comparison[C]. In: Findings of the Association for Computational Linguistics: ACL 2023, pp 11330–11343
Hisamoto S, Post M, Duh K (2020) Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system?[J]. Trans Association Comput Linguistics 8:49–63
Article Google Scholar
Carlini N, Tramer F, Wallace E et al (2021) Extracting training data from large language models[C]. 30th USENIX Security Symposium (USENIX Security 21), 2633–2650
Kandpal N, Wallace E, Raffel C (2022) Deduplicating training data mitigates privacy risks in language models[C]. International Conference on Machine Learning. PMLR, 10697–10707
Song C, Shmatikov V (2019) Auditing data provenance in text-generation models[C]. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 196–206
Chen D, Yu N, Zhang Y et al (2020) Gan-leaks: A taxonomy of membership inference attacks against generative models[C]. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 343–362
Yuan X, Zhang L (2022) Membership inference attacks and defenses in neural network pruning[C]. 31st USENIX Security Symposium, 4561–4578
Shokri R, Stronati M, Song C et al (2017) Membership inference attacks against machine learning models[C]. 2017 IEEE symposium on security and privacy (SP). IEEE, 3–18
Salem A, Zhang Y, Humbert M et al (2019) ML-Leaks: Model and data independent membership inference attacks and defenses on machine learning models[C]. Network and Distributed Systems Security (NDSS) Symposium
Mahloujifar S, Inan HA, Chase M et al (2021) Membership Inference on Word Embedding and Beyond[J]. ar**v e-prints. ar**v, p 210611384
Oh MG, Park LH, Kim J et al (2023) Membership inference attacks with token-level deduplication on Korean Language Models. IEEE Access 11:10207–10217
Article Google Scholar
Chen S, Wang W, Zhong Y et al (2024) HP-MIA: a novel membership inference attack scheme for high membership prediction precision. Computers Secur 136:103571
Article Google Scholar
Zhang M, Yu N, Wen R et al (2024) Generated distributions are all you need for membership inference attacks against generative models[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4839–4849
Duan M, Suri A, Mireshghallah N et al (2024) Do membership inference attacks work on large language models? ar**v preprint ar**v:2402.07841
Wang X, Wu L, Guan Z, GradDiff (2023) Gradient-based membership inference attacks against federated distillation with differential comparison. Inf Sci: 120068
Zhu C, Zhang J, Cheng X et al (2022) MIA-Leak: Exploring membership inference attacks in federated learning systems[C]. International Conference on Blockchain Technology and Emerging Applications. Springer Nature Switzerland, Cham, 140–154
Liu Z, Zhang X, Chen C et al (2022) Membership inference attacks against robust graph neural network[C]. International Symposium on Cyberspace Safety and Security. Springer International Publishing, Cham, 259–273
Oh MG, Park LH, Kim J et al (2022) On membership inference attacks to generative language models across language domains[C]. International Conference on Information Security Applications. Springer Nature Switzerland, Cham, 143–155
Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. ar**v preprint ar**v:1807.03748
Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings[C]. Conference on Empirical Methods in Natural Language Processing, EMNLP 2021. Association for Computational Linguistics (ACL), 6894–6910
Chen T, Kornblith S, Norouzi M et al (2020) A simple framework for contrastive learning of visual representations[C]. International conference on machine learning. PMLR, 1597–1607
Liu H, Jia J, Qu W et al (2021) EncoderMI: Membership inference against pre-trained encoders in contrastive learning[C]. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2081–2095
Li G, Rezaei S, Liu X (2022) User-level membership inference attack against metric embedding learning[C]. ICLR 2022 Workshop on PAIR^ 2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data
Sui H, Sun X, Zhang J et al (2023) Multi-level membership inference attacks in federated learning based on active GAN[J]. Neural Comput Appl: 1–15
** Y, Lou W, Gao Y (2023) Membership inference attacks against compression models. Computing: 1–24
Bai Y, Chen T, Fan M (2021) A survey on membership inference attacks against machine learning. Management 6:14
Google Scholar
Li X, Thickstun J, Gulrajani I et al (2022) Diffusion-lm improves controllable text generation. Adv Neural Inf Process Syst 35:4328–4343
Google Scholar
Kenton J, Bert (2019) Pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of naacL-HLT. 1: 2
He X, Lyu L, Sun L et al (2021) Model extraction and adversarial transferability, your BERT is vulnerable![C]. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2006–2012
Sablayrolles A, Douze M, Schmid C et al (2019) White-box vs black-box: Bayes optimal strategies for membership inference[C]. International Conference on Machine Learning. PMLR, 5558–5567

Download references

Acknowledgements

The work was supported by 242 National Information Security Projects (no.2020A065).

Author information

Authors and Affiliations

School of Information and Electronics, Bei**g Institute of Technology, Bei**g, 100081, People’s Republic of China
Yao Cheng, Senlin Luo, Limin Pan, Yunwei Wan & **nshuai Li

Authors

Yao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Senlin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Limin Pan
View author publications
You can also search for this author in PubMed Google Scholar
Yunwei Wan
View author publications
You can also search for this author in PubMed Google Scholar
**nshuai Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

1. We introduce a competitive attack method, HAMIATCM, designed to optimize member distributions, particularly well-suited for general text classification models with limited knowledge of members. The approach aims to broaden the distribution difference between members and non-members based on the shadow model.

2. A conditional data augmentation technique guided by labels is designed. It incorporates label correlation with importance score to preserve crucial and susceptible words associated with member distributions.

3. A robust and powerful contrastive learning objective function is constructed to extract distribution differences in embeddings among multiple classes from the shadow model. And the joint loss function concentrates on the relationship between augmented data and labels, strengthening the representation of multiple classes and the adaptability and generalization of the shadow model.

Corresponding author

Correspondence to Limin Pan.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Informed consent

Written informed consent was obtained for each participant according to federal and institutional guidelines.

Research involving human participants and/or animals

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional.

Conflicts of interest

All authors disclosed no relevant relationships.

No potential conflict of interest was reported by the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

ESM 1

(RAR 2.67 MB)

Appendices

Appendix 1: Details of HAMIATCM and target models

In this section, we perform two steps on the target model: splitting data and training the target model.

Prior to fine-tuning the target text classification model ${C}_{t}$, data preprocessing is performed as a preliminary step. Subsequently, leveraging the conditional data augmentation technique outlined in Section 3.2, we synthesize comparable shadow train samples based on label correlation. Following this, employing the text encoder BERT [36] denoted as ${E}_{s}$ (defined in Section 3.3.1), we extract the text semantic representation of the original samples and the augmented samples. Third, aiming to enhance the performance of the shadow model, we introduce an objective function that incorporates contrastive loss for samples, adaptive boundary loss, and classification cross-entropy loss (explained in Section 3.3.2). Upon completion of the full training process, we acquire a shadow model with an equivalent representation, denoted as $E_s^{\prime}$.

Based on the shadow model, we train the attack model. We apply the ${A}^{{\prime }}$ for classifying the testing samples as the result of the evaluation phase. We divide the texting samples into four parts $S=\{{X}_{tm},{X}_{sm},{X}_{a},{X}_{test}\}$, where ${X}_{tm}=\{{x}_{tm1},{x}_{tm2},{\dots ,x}_{tmN}\}$ is used to train target model ${C}_{t}, {X}_{sm}=\{{x}_{sm1},{x}_{sm2},{\dots ,x}_{smN}\}$ is used to train shadow model ${E}_{s}$ which is disjoint with ${X}_{tm}$ and on the same distribution, where $\left|{X}_{tm}\right|=\left|{X}_{sm}\right|, {X}_{a}=\{{x}_{a1},{x}_{a2},{\dots ,x}_{an}\}$ is used to train binary attack classifier ${\rm A} \ \text{and} \ {X}_{test}=\{{x}_{1},{x}_{2},{\dots ,x}_{m}\}$ is used to test the performance of ${A}^{{\prime }}$.

For the above process, ${X}_{tm} \ \text{and} \ {X}_{sm}$ have the corresponding labels ${L}_{m}=\{{L}_{m1},{L}_{m2},\dots {L}_{mN}\},{L}_{mi} \in\{1,2, \dots, C\}$ where C is the number of classes. $X_{a} \ \text{and} \ X_{test}$ have the corresponding labels$y_m\in\{1,0\}.$

In this paper, it’s necessary to train a generalized target model aimed at evaluating the performance of HAMIATCM. Early-stop was used throughout the training process, qualitatively contrasting the target models in training samples and testing samples to ensure that objective variables such model overfitting are not influencing the attack performance. Detailed training process of target model as shown in Appendix Fig. 12.

1.1 Details of target models

To start with, the input is a sequence of tokens $X_{tm}=\{{x}_{tm1},{x}_{tm2},\dots,{x}_{tmN}\}$ with a corresponding class label ${L}_{tm}=\{{L}_{tm1},{L}_{tm2},\dots {L}_{tmN}\},\{{L}_{tmi}\in\{1,2,\dots, C\}$ where C is the total number of classes. To identify the relationship between the input text and the corresponding label, a target text classification model is established.

From the viewpoint of an attacker, we face the challenge whether to maximize the log-likelihood of the appropriate label (i.e., $log\mathbb{P}\left({L}_{tmi}|{x}_{tmi}\right)$, $i\in \{\text{1,2},\dots ,N\}$).

The target model performs two steps of text preprocessing and word embedding. To obtain the text embedding for a training sample, we imposed on the training sample to BERT (Devlin et al., 2018) [36]. The first $\left[CLS\right]$ vector of the hidden layer of the model serves as the embedding for the entire training sample. Then, to accomplish the downstream task of classification, the hidden text representation is given to fully connected layers. Following the computation of the classification cross-entropy loss, the parameter of the target model ${C}_{t}$ is gradually updated. The details are shown in (A.1):

$${\varvec{v}}_{\varvec{i}}={C}_{t}\left({X}_{tmi}\right), i\in \left\{\text{1,2},\dots ,N\right\}.$$

(A.1)

${\varvec{v}}_{\varvec{i}}$ is the $i$th hidden train embedding obtained after encoding. Then, to convert it into the following classification space and boost the neural network’s capacity for nonlinear fitting, we put ${\varvec{v}}_{\varvec{i}}$ to fully connected layer. Next, by employing the $softmax$ function, we generate a label prediction $\varvec{p}$ formulated (A.2):

$$p=softmax\left({\varvec{v}}_{\varvec{i}}\right),$$

(A.2)

$${\mathcal{L}}_{ce}=-\sum _{i=1}^{N}\left[{L}_{tmi}log\varvec{p}+\left(1-{L}_{tmi}\right){log}\left(1-\varvec{p}\right)\right].$$

(A.3)

Second, after previous work, we use a binary cross-entropy loss function, where ${L}_{tmi}$ is the ground truth. $\varvec{p}$ is the probability of the ground truth. ${\mathcal{L}}_{ce}$ is classification cross-entropy detailed in formula (A.3).

Third, a target model ${C}_{t}$ is trained by classification cross-entropy loss in the training phrase. Moreover, we can run the attack based on ${C}_{t}^{{\prime }}$. Similar to prior work (Shokri et al., 2017; Carlini et al., 2021) [11, 16], our method assumes access to the prediction of the target model, where it receives a sequence of tokens and generates a class prediction with its corresponding probability.

Appendix 2: Experimental settings

In this section, we show the details of the architecture and hyperparameters of the attack model we proposed in this paper.

The experiment about parameters on training a target model is shown in Appendix Tables 12 and 13. The max_seq_length is 128, train_batch_size is 64, eval_batch_size is 64, learning_rate is 5e-5, num_train_epoch is 4, tem is the temperature parameter of contrast learning, which is a length of 0 to 1, lam is a length of 0 to 1.

Table 12 The parameter on BERT model

Full size table

Table 13 The parameter on SimCSE model

Full size table

First, we split the dataset into two disjoint parts, and fine-tuning BERT or SimCSE on three datasets (AG NEWS, BLOG, TP) as our target classification model. We will divide each dataset into four equal parts, which will be referred to as the training set of target model ${X}_{tm}$, testing set of target model ${X}_{tnm}$, training set of shadow model ${X}_{sm}$, and training set of attack model ${X}_{snm}$, respectively. ${X}_{a}$ will be used to train the attack classifier.

Second, we fine-tuning a target downstream classifier which has three fully connected layers. The details on the training target model are shown as follows Appendix Fig. 13.

Above all, two kinds of target models train on three datasets which is generalized. In the experiment the on BERT, AG NEWS reaches train accuracy and test accuracy to 83.01%, and 80.16%. We similarly select a BERT as shadow model, which has a train accuracy and test accuracy of 97.11%, and 90.54%. BLOG on target model reaches train accuracy and test accuracy to 89.16%, 86.02%. Then, we select BERT on BLOG as shadow model, which has a train accuracy and test accuracy to 91.44%, 87.71%. TP on target model reaches train accuracy and test accuracy to 82.40%, 80.28%. We fine-tuning BERT on TP as shadow model, which has a train accuracy and test accuracy to 84.38%, 80.41%.

The experiment on SimCSE, AG NEWS reaches train accuracy and test accuracy to 92.72%, 87.25%. We fine-tune a SimCSE as shadow model, which has a train accuracy and test accuracy to 97.71%, 96.75%. BLOG on target model reaches train accuracy and test accuracy to 92.97%, 86.36%. We select SimCSE on BLOG as a shadow model, which has a train accuracy and test accuracy to 89.45%, 87.82%. TP on target model reaches train accuracy and test accuracy to 87.70%, 83.95%. SimCSE on TP as shadow model, which has a train accuracy and test accuracy to 89.06%, 87.50%.

The details are listed as follows in Appendix Tables 14 and 15. Additionally, we can take three datasets about the difference on text distribution, which are shown as below Appendix Fig. 14.

Table 14 The training accuracy and test accuracy on BERT model

Full size table

Table 15 The training accuracy and test accuracy on SimCSE model

Full size table

Appendix 3: The supplementary experiments of medical text classification

In this section, we implemented experiments on publicly available Medical Text Classification (MTC) datasets. This choice was made to assess the privacy implications without the compromise of personal sensitive information. The experiment is confined to research purposes only and does not involve issues such as medical ethics.

We select four prevalent target models designed for medical datasets as our target, including BERT-based models, longformer-based models, clinical-based models and mimic-simcse models. These models are specifically tailored for processing medical text, which often includes sensitive information. The objective is to verify the potential privacy leakage concerning patient information in distinct medical and clinical text classification scenarios. It is worth noting that, during the research of medical text data classification scenarios, a portion of the research relied on experiments involving MIMIC-III [3]. However, it is crucial that this dataset is not currently publicly available. In the supplementary investigation of two medical text membership inference attacks, neither the data nor the code has been made open source. This also is one of the difficulties and challenges encountered in the application of MIAs within the domain of medical texts.

In light of this, this section discusses the following experiments:

1.
Conducting experiments on de-identified MTC and comparing with comparative methods.
2.
Demonstrating privacy leakage through instantiation.
3.
Implementing attacks based on multiple target models to verify the generalizability and robustness of the method.

The results of comparison experiments are listed in Appendix Table 16.

Table 16 The results of comparison experiments

Full size table

HAMIATCM demonstrates superior performance when compared to the three other methods under medical texts in Table 16. Of particular note is the noteworthy enhancement in the AR, accentuated by the enhanced feature representation for co-occurring members achieved through conditional data augmentation.

As illustrated in Appendix Fig. 15 and Table 17 above, the mentioned target models are derived from general medical text classification models. Within this context, the first category is associated with clinical data trained on the BERT model, the second category involves medical data employed for fine-tuning the SimCSE model, and the third category encompasses mimic data trained on the DistilBERT model. The last one is that clinical data trained on the Longformer model.

Table 17 The results of different target models

Full size table

The experimental results reveal that the Attack F1 for the mimic data & distillation model is the highest, whereas the Attack F1 for the medical data & SimCSE model is the lowest. Upon analysis, it is discerned that the member data primarily originates from the mimic-distilbert model, with certain members employed for training the clinical-bert-base model.

As illustrated by the visualizations in Appendix Figs. 16 and 17, the mimic-distilbert dataset displays extensive coverage, suggesting that our testing samples are highly likely to be included in training of the mimic-distilbert and clinical-longformer models.

The memorized contents of the target models are shown in Appendix Table 18. Upon analyzing the members classified by the four target models, occurrences of specific types of words are observed and presented. For example, the four-category model shows an elevated membership likelihood for words such as “##stroke”, “##Lung cancer”, “##Bowel dysfunction” and others, indicating an increased probability of membership.

Table 18 The memorized contents of the target models

Full size table

The number of co-occurring member words is presented in Appendix Fig. 18. This chart provides the frequency of particular words in members, illustrating the potential compromise of privacy in the training samples regarding specific target models. It is essential to emphasize that no individual privacy has been disclosed, and the information is solely employed for statistical analysis.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cheng, Y., Luo, S., Pan, L. et al. HAMIATCM: high-availability membership inference attack against text classification models under little knowledge. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05495-x

Download citation

Accepted: 29 April 2024
Published: 19 June 2024
DOI: https://doi.org/10.1007/s10489-024-05495-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

HAMIATCM: high-availability membership inference attack against text classification models under little knowledge

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TextCut: A Multi-region Replacement Data Augmentation Approach for Text Imbalance Classification

Data augmentation and adversary attack on limit resources text classification

Data Augmentation Based on Topic Relevance to Enhance Text Classification in Scarcity of Training Data

Data availability

Notes

References

Acknowledgements