Log in

HAMIATCM: high-availability membership inference attack against text classification models under little knowledge

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Membership inference attack opens up a newly emerging and rapidly growing research to steal user privacy from text classification models, a core problem of which is shadow model construction and members distribution optimization in inadequate members. The textual semantic is likely disrupted by simple text augmentation techniques, which weakens the correlation between labels and texts and reduces the precision of member classification. Shadow models trained exclusively with cross-entropy loss have little differentiation in embeddings among various classes, which deviates from the distribution of target models, then impacts the embeddings of members and reduces the F1 score. A competitive and High-Availability Membership Inference Attack against Text Classification Model (HAMIATCM) is proposed. At the data level, by selecting highly significant words and applying text augmentation techniques such as replacement or deletion, we expand knowledge of attackers, preserving vulnerable members to enhance the sensitive member distribution. At the model level, constructing contrastive loss and adaptive boundary loss to amplify the distribution differences among various classes, dynamically optimize the boundaries of members, enhancing the text representation capability of the shadow model and the classification performance of the attack classifier. Experimental results demonstrate that HAMIATCM achieves new state-of-the-art, significantly reduces the false positive rate, and strengthens the capability of fitting the output distribution of the target model with less knowledge of members.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. The Medical Text Classification is available from the Kaggle, which doesn’t contain sensitive personal information, and it only for theoretical research.

Additionally, all data are cited references and sources.

Notes

  1. https://www.kaggle.com/code/chaitanyakck/medical-text-classification/input.

References

  1. Hu H, Salcic Z, Sun L et al (2022) Membership inference attacks on machine learning: a survey. ACM Comput Surv (CSUR) 54(11s):1–37

    Article  Google Scholar 

  2. Vakili T (2023) Attacking and defending the privacy of clinical language models[D]. Department of Computer and Systems Sciences, Stockholm University

  3. Vakili T, Dalianis H (2021) Are clinical BERT models privacy preserving? The difficulty of extracting patient-condition associations[C]. Proceedings of the AAAI 2021 Fall Symposium on Human Partnership with Medical AI: Design, Operationalization, and Ethics

  4. Zhang M, Ren Z, Wang Z et al (2021) Membership inference attacks against recommender systems[C]. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 864–879

  5. Zhang G, Liu B, Zhu T et al (2022) Label-only membership inference attacks and defenses in semantic segmentation models. IEEE Trans Dependable Secur Comput 20(2):1435–1449

    Article  Google Scholar 

  6. Shejwalkar V, Inan HA, Houmansadr A et al (2021) Membership inference attacks against nlp classification models[C]. NeurIPS 2021 Workshop Privacy in Machine Learning

  7. Wang Y, Xu N, Huang S et al (2022) Analyzing and defending against membership inference attacks in natural language processing classification[C]. 2022 IEEE International Conference on Big Data (Big Data). IEEE, 5823–5832

  8. Song C, Raghunathan A (2020) Information leakage in embedding models[C]. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 377–390

  9. Justus Mattern F, Mireshghallah Z, ** et al (2023) Membership inference attacks against language models via neighbourhood comparison[C]. In: Findings of the Association for Computational Linguistics: ACL 2023, pp 11330–11343

  10. Hisamoto S, Post M, Duh K (2020) Membership inference attacks on sequence-to-sequence models: is my data in your machine translation system?[J]. Trans Association Comput Linguistics 8:49–63

    Article  Google Scholar 

  11. Carlini N, Tramer F, Wallace E et al (2021) Extracting training data from large language models[C]. 30th USENIX Security Symposium (USENIX Security 21), 2633–2650

  12. Kandpal N, Wallace E, Raffel C (2022) Deduplicating training data mitigates privacy risks in language models[C]. International Conference on Machine Learning. PMLR, 10697–10707

  13. Song C, Shmatikov V (2019) Auditing data provenance in text-generation models[C]. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 196–206

  14. Chen D, Yu N, Zhang Y et al (2020) Gan-leaks: A taxonomy of membership inference attacks against generative models[C]. Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 343–362

  15. Yuan X, Zhang L (2022) Membership inference attacks and defenses in neural network pruning[C]. 31st USENIX Security Symposium, 4561–4578

  16. Shokri R, Stronati M, Song C et al (2017) Membership inference attacks against machine learning models[C]. 2017 IEEE symposium on security and privacy (SP). IEEE, 3–18

  17. Salem A, Zhang Y, Humbert M et al (2019) ML-Leaks: Model and data independent membership inference attacks and defenses on machine learning models[C]. Network and Distributed Systems Security (NDSS) Symposium

  18. Mahloujifar S, Inan HA, Chase M et al (2021) Membership Inference on Word Embedding and Beyond[J]. ar**v e-prints. ar**v, p 210611384

  19. Oh MG, Park LH, Kim J et al (2023) Membership inference attacks with token-level deduplication on Korean Language Models. IEEE Access 11:10207–10217

    Article  Google Scholar 

  20. Chen S, Wang W, Zhong Y et al (2024) HP-MIA: a novel membership inference attack scheme for high membership prediction precision. Computers Secur 136:103571

    Article  Google Scholar 

  21. Zhang M, Yu N, Wen R et al (2024) Generated distributions are all you need for membership inference attacks against generative models[C]. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4839–4849

  22. Duan M, Suri A, Mireshghallah N et al (2024) Do membership inference attacks work on large language models? ar**v preprint ar**v:2402.07841

  23. Wang X, Wu L, Guan Z, GradDiff (2023) Gradient-based membership inference attacks against federated distillation with differential comparison. Inf Sci: 120068

  24. Zhu C, Zhang J, Cheng X et al (2022) MIA-Leak: Exploring membership inference attacks in federated learning systems[C]. International Conference on Blockchain Technology and Emerging Applications. Springer Nature Switzerland, Cham, 140–154

  25. Liu Z, Zhang X, Chen C et al (2022) Membership inference attacks against robust graph neural network[C]. International Symposium on Cyberspace Safety and Security. Springer International Publishing, Cham, 259–273

  26. Oh MG, Park LH, Kim J et al (2022) On membership inference attacks to generative language models across language domains[C]. International Conference on Information Security Applications. Springer Nature Switzerland, Cham, 143–155

  27. Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. ar**v preprint ar**v:1807.03748

  28. Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings[C]. Conference on Empirical Methods in Natural Language Processing, EMNLP 2021. Association for Computational Linguistics (ACL), 6894–6910

  29. Chen T, Kornblith S, Norouzi M et al (2020) A simple framework for contrastive learning of visual representations[C]. International conference on machine learning. PMLR, 1597–1607

  30. Liu H, Jia J, Qu W et al (2021) EncoderMI: Membership inference against pre-trained encoders in contrastive learning[C]. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2081–2095

  31. Li G, Rezaei S, Liu X (2022) User-level membership inference attack against metric embedding learning[C]. ICLR 2022 Workshop on PAIR^ 2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data

  32. Sui H, Sun X, Zhang J et al (2023) Multi-level membership inference attacks in federated learning based on active GAN[J]. Neural Comput Appl: 1–15

  33. ** Y, Lou W, Gao Y (2023) Membership inference attacks against compression models. Computing: 1–24

  34. Bai Y, Chen T, Fan M (2021) A survey on membership inference attacks against machine learning. Management 6:14

    Google Scholar 

  35. Li X, Thickstun J, Gulrajani I et al (2022) Diffusion-lm improves controllable text generation. Adv Neural Inf Process Syst 35:4328–4343

    Google Scholar 

  36. Kenton J, Bert (2019) Pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of naacL-HLT. 1: 2

  37. He X, Lyu L, Sun L et al (2021) Model extraction and adversarial transferability, your BERT is vulnerable![C]. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2006–2012

  38. Sablayrolles A, Douze M, Schmid C et al (2019) White-box vs black-box: Bayes optimal strategies for membership inference[C]. International Conference on Machine Learning. PMLR, 5558–5567

Download references

Acknowledgements

The work was supported by 242 National Information Security Projects (no.2020A065).

Author information

Authors and Affiliations

Authors

Contributions

1. We introduce a competitive attack method, HAMIATCM, designed to optimize member distributions, particularly well-suited for general text classification models with limited knowledge of members. The approach aims to broaden the distribution difference between members and non-members based on the shadow model.

2. A conditional data augmentation technique guided by labels is designed. It incorporates label correlation with importance score to preserve crucial and susceptible words associated with member distributions.

3. A robust and powerful contrastive learning objective function is constructed to extract distribution differences in embeddings among multiple classes from the shadow model. And the joint loss function concentrates on the relationship between augmented data and labels, strengthening the representation of multiple classes and the adaptability and generalization of the shadow model.

Corresponding author

Correspondence to Limin Pan.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Informed consent

Written informed consent was obtained for each participant according to federal and institutional guidelines. 

Research involving human participants and/or animals

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional.

Conflicts of interest

All authors disclosed no relevant relationships.

No potential conflict of interest was reported by the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

ESM 1

(RAR 2.67 MB)

Appendices

Appendix 1: Details of HAMIATCM and target models

In this section, we perform two steps on the target model: splitting data and training the target model.

Prior to fine-tuning the target text classification model \({C}_{t}\), data preprocessing is performed as a preliminary step. Subsequently, leveraging the conditional data augmentation technique outlined in Section 3.2, we synthesize comparable shadow train samples based on label correlation. Following this, employing the text encoder BERT [36] denoted as \({E}_{s}\) (defined in Section 3.3.1), we extract the text semantic representation of the original samples and the augmented samples. Third, aiming to enhance the performance of the shadow model, we introduce an objective function that incorporates contrastive loss for samples, adaptive boundary loss, and classification cross-entropy loss (explained in Section 3.3.2). Upon completion of the full training process, we acquire a shadow model with an equivalent representation, denoted as \(E_s^{\prime}\).

Based on the shadow model, we train the attack model. We apply the \({A}^{{\prime }}\) for classifying the testing samples as the result of the evaluation phase. We divide the texting samples into four parts \(S=\{{X}_{tm},{X}_{sm},{X}_{a},{X}_{test}\}\), where \({X}_{tm}=\{{x}_{tm1},{x}_{tm2},{\dots ,x}_{tmN}\}\) is used to train target model \({C}_{t}, {X}_{sm}=\{{x}_{sm1},{x}_{sm2},{\dots ,x}_{smN}\}\) is used to train shadow model \({E}_{s}\) which is disjoint with \({X}_{tm}\) and on the same distribution, where \(\left|{X}_{tm}\right|=\left|{X}_{sm}\right|, {X}_{a}=\{{x}_{a1},{x}_{a2},{\dots ,x}_{an}\}\) is used to train binary attack classifier \({\rm A} \ \text{and} \ {X}_{test}=\{{x}_{1},{x}_{2},{\dots ,x}_{m}\}\) is used to test the performance of \({A}^{{\prime }}\).

For the above process, \({X}_{tm} \ \text{and} \ {X}_{sm}\) have the corresponding labels \({L}_{m}=\{{L}_{m1},{L}_{m2},\dots {L}_{mN}\},{L}_{mi} \in\{1,2, \dots, C\}\) where C is the number of classes. \(X_{a} \ \text{and} \ X_{test}\) have the corresponding labels\(y_m\in\{1,0\}.\) 

In this paper, it’s necessary to train a generalized target model aimed at evaluating the performance of HAMIATCM. Early-stop was used throughout the training process, qualitatively contrasting the target models in training samples and testing samples to ensure that objective variables such model overfitting are not influencing the attack performance. Detailed training process of target model as shown in Appendix Fig. 12.

Fig. 12
figure 12

The training process of the target model

1.1 Details of target models

To start with, the input is a sequence of tokens \(X_{tm}=\{{x}_{tm1},{x}_{tm2},\dots,{x}_{tmN}\}\) with a corresponding class label \({L}_{tm}=\{{L}_{tm1},{L}_{tm2},\dots {L}_{tmN}\},\{{L}_{tmi}\in\{1,2,\dots, C\}\) where C is the total number of classes. To identify the relationship between the input text and the corresponding label, a target text classification model is established.

From the viewpoint of an attacker, we face the challenge whether to maximize the log-likelihood of the appropriate label (i.e., \(log\mathbb{P}\left({L}_{tmi}|{x}_{tmi}\right)\), \(i\in \{\text{1,2},\dots ,N\}\)).

The target model performs two steps of text preprocessing and word embedding. To obtain the text embedding for a training sample, we imposed on the training sample to BERT (Devlin et al., 2018) [36]. The first \(\left[CLS\right]\) vector of the hidden layer of the model serves as the embedding for the entire training sample. Then, to accomplish the downstream task of classification, the hidden text representation is given to fully connected layers. Following the computation of the classification cross-entropy loss, the parameter of the target model \({C}_{t}\) is gradually updated. The details are shown in (A.1):

$${\varvec{v}}_{\varvec{i}}={C}_{t}\left({X}_{tmi}\right), i\in \left\{\text{1,2},\dots ,N\right\}.$$
(A.1)

\({\varvec{v}}_{\varvec{i}}\) is the \(i\)th hidden train embedding obtained after encoding. Then, to convert it into the following classification space and boost the neural network’s capacity for nonlinear fitting, we put \({\varvec{v}}_{\varvec{i}}\) to fully connected layer. Next, by employing the \(softmax\) function, we generate a label prediction \(\varvec{p}\) formulated (A.2):

$$p=softmax\left({\varvec{v}}_{\varvec{i}}\right),$$
(A.2)
$${\mathcal{L}}_{ce}=-\sum _{i=1}^{N}\left[{L}_{tmi}log\varvec{p}+\left(1-{L}_{tmi}\right){log}\left(1-\varvec{p}\right)\right].$$
(A.3)

Second, after previous work, we use a binary cross-entropy loss function, where \({L}_{tmi}\) is the ground truth. \(\varvec{p}\) is the probability of the ground truth. \({\mathcal{L}}_{ce}\) is classification cross-entropy detailed in formula (A.3).

Third, a target model \({C}_{t}\) is trained by classification cross-entropy loss in the training phrase. Moreover, we can run the attack based on \({C}_{t}^{{\prime }}\). Similar to prior work (Shokri et al., 2017; Carlini et al., 2021) [11, 16], our method assumes access to the prediction of the target model, where it receives a sequence of tokens and generates a class prediction with its corresponding probability.

Appendix 2: Experimental settings

In this section, we show the details of the architecture and hyperparameters of the attack model we proposed in this paper.

The experiment about parameters on training a target model is shown in Appendix Tables 12 and 13. The max_seq_length is 128, train_batch_size is 64, eval_batch_size is 64, learning_rate is 5e-5, num_train_epoch is 4, tem is the temperature parameter of contrast learning, which is a length of 0 to 1, lam is a length of 0 to 1.

Table 12 The parameter on BERT model
Table 13 The parameter on SimCSE model

First, we split the dataset into two disjoint parts, and fine-tuning BERT or SimCSE on three datasets (AG NEWS, BLOG, TP) as our target classification model. We will divide each dataset into four equal parts, which will be referred to as the training set of target model \({X}_{tm}\), testing set of target model \({X}_{tnm}\), training set of shadow model \({X}_{sm}\), and training set of attack model \({X}_{snm}\), respectively. \({X}_{a}\) will be used to train the attack classifier.

Second, we fine-tuning a target downstream classifier which has three fully connected layers. The details on the training target model are shown as follows Appendix Fig. 13.

Fig. 13
figure 13

The model performance (accuracy) of model training

Above all, two kinds of target models train on three datasets which is generalized. In the experiment the on BERT, AG NEWS reaches train accuracy and test accuracy to 83.01%, and 80.16%. We similarly select a BERT as shadow model, which has a train accuracy and test accuracy of 97.11%, and 90.54%. BLOG on target model reaches train accuracy and test accuracy to 89.16%, 86.02%. Then, we select BERT on BLOG as shadow model, which has a train accuracy and test accuracy to 91.44%, 87.71%. TP on target model reaches train accuracy and test accuracy to 82.40%, 80.28%. We fine-tuning BERT on TP as shadow model, which has a train accuracy and test accuracy to 84.38%, 80.41%.

The experiment on SimCSE, AG NEWS reaches train accuracy and test accuracy to 92.72%, 87.25%. We fine-tune a SimCSE as shadow model, which has a train accuracy and test accuracy to 97.71%, 96.75%. BLOG on target model reaches train accuracy and test accuracy to 92.97%, 86.36%. We select SimCSE on BLOG as a shadow model, which has a train accuracy and test accuracy to 89.45%, 87.82%. TP on target model reaches train accuracy and test accuracy to 87.70%, 83.95%. SimCSE on TP as shadow model, which has a train accuracy and test accuracy to 89.06%, 87.50%.

The details are listed as follows in Appendix Tables 14 and 15. Additionally, we can take three datasets about the difference on text distribution, which are shown as below Appendix Fig. 14.

Table 14 The training accuracy and test accuracy on BERT model
Table 15 The training accuracy and test accuracy on SimCSE model
Fig. 14
figure 14

The entropy distribution of BLOG members and non-members (left column). The accuracy with iteration (right column)

Appendix 3: The supplementary experiments of medical text classification

In this section, we implemented experiments on publicly available Medical Text Classification (MTC) datasets. This choice was made to assess the privacy implications without the compromise of personal sensitive information. The experiment is confined to research purposes only and does not involve issues such as medical ethics.

We select four prevalent target models designed for medical datasets as our target, including BERT-based models, longformer-based models, clinical-based models and mimic-simcse models. These models are specifically tailored for processing medical text, which often includes sensitive information. The objective is to verify the potential privacy leakage concerning patient information in distinct medical and clinical text classification scenarios. It is worth noting that, during the research of medical text data classification scenarios, a portion of the research relied on experiments involving MIMIC-III [3]. However, it is crucial that this dataset is not currently publicly available. In the supplementary investigation of two medical text membership inference attacks, neither the data nor the code has been made open source. This also is one of the difficulties and challenges encountered in the application of MIAs within the domain of medical texts.

In light of this, this section discusses the following experiments:

  1. 1.

    Conducting experiments on de-identified MTC and comparing with comparative methods.

  2. 2.

    Demonstrating privacy leakage through instantiation.

  3. 3.

    Implementing attacks based on multiple target models to verify the generalizability and robustness of the method.

The results of comparison experiments are listed in Appendix Table 16.

Table 16 The results of comparison experiments

HAMIATCM demonstrates superior performance when compared to the three other methods under medical texts in Table 16. Of particular note is the noteworthy enhancement in the AR, accentuated by the enhanced feature representation for co-occurring members achieved through conditional data augmentation.

As illustrated in Appendix Fig. 15 and Table 17 above, the mentioned target models are derived from general medical text classification models. Within this context, the first category is associated with clinical data trained on the BERT model, the second category involves medical data employed for fine-tuning the SimCSE model, and the third category encompasses mimic data trained on the DistilBERT model. The last one is that clinical data trained on the Longformer model.

Fig. 15
figure 15

The ROC curve of MTC (left); The curve of training attack accuracy (right)

Table 17 The results of different target models

The experimental results reveal that the Attack F1 for the mimic data & distillation model is the highest, whereas the Attack F1 for the medical data & SimCSE model is the lowest. Upon analysis, it is discerned that the member data primarily originates from the mimic-distilbert model, with certain members employed for training the clinical-bert-base model.

As illustrated by the visualizations in Appendix Figs. 16 and 17, the mimic-distilbert dataset displays extensive coverage, suggesting that our testing samples are highly likely to be included in training of the mimic-distilbert and clinical-longformer models.

Fig. 16
figure 16

The attack results under multiple target models

Fig. 17
figure 17

The radar plot of attack results

The memorized contents of the target models are shown in Appendix Table 18. Upon analyzing the members classified by the four target models, occurrences of specific types of words are observed and presented. For example, the four-category model shows an elevated membership likelihood for words such as “##stroke”, “##Lung cancer”, “##Bowel dysfunction” and others, indicating an increased probability of membership.

Table 18 The memorized contents of the target models

The number of co-occurring member words is presented in Appendix Fig. 18. This chart provides the frequency of particular words in members, illustrating the potential compromise of privacy in the training samples regarding specific target models. It is essential to emphasize that no individual privacy has been disclosed, and the information is solely employed for statistical analysis.

Fig. 18
figure 18

The number of co-occurring member words

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Y., Luo, S., Pan, L. et al. HAMIATCM: high-availability membership inference attack against text classification models under little knowledge. Appl Intell (2024). https://doi.org/10.1007/s10489-024-05495-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05495-x

Keywords

Navigation