Abstract
Federated learning is a machine learning approach where multiple edge devices, each holding local data samples, send a locally trained model to the central server, and the central server aggregates the models using a specific aggregation rule. Notably, the distributed nature of federated learning exposes these devices to potential poisoning attacks, especially during the training phase. This paper presents a systematic study on the effect of data poisoning attacks against SVM classifiers in a federated setting (F-SVM). In particular, we implement two widely recognized data poisoning attacks against SVMs named Label-Flip** and Optimal-Poisoning attacks and evaluate their impact on the global F-SVM accuracy using MNIST, FashionMNIST, CIFAR-10, and IJCNN1 datasets. Our results reveal significant reductions in accuracy, highlighting the susceptibility of F-SVMs to such attacks. Our empirical results highlight that if 30% of the edge devices are compromised, accuracy drops by 15%, and if compromised devices increase to 35%, accuracy goes down by 32%. We evaluated the impacts when ratio of the poisonous points is different and when datasets are not independently and identically distributed (non-IID) across edge devices. In addition to this, we investigate some preliminary defense mechanisms against poisoning attacks for F-SVMs. Consequently, we assessed the efficacy of three popular unsupervised outlier detection methods: the K-nearest Neighbor algorithm, Histogram-based outlier detection, and Copula-based outlier detection. All our source codes are written in Python and are open source.
Similar content being viewed by others
Data availability
This study exclusively utilized datasets that are publicly available. All datasets analyzed during this research are accessible from their respective public domain sources. Full references to these datasets are provided within the article.
References
Anisetti M, Ardagna CA, Balestrucci A, et al. On the robustness of ensemble-based machine learning against data poisoning. Preprint ar**v:2209.14013; 2022.
Anisetti M, Ardagna CA, Bena N, et al. Rethinking certification for trustworthy machine-learning-based applications. IEEE Int Comput. 2023;27(6):22–8. https://doi.org/10.1109/MIC.2023.3322327.
Bagdasaryan E, Veit A, Hua Y, et al. How to backdoor federated learning. In: International conference on artificial intelligence and statistics, PMLR; 2020. p. 2938–48.
Barreno M, Nelson B, Joseph AD, et al. The security of machine learning. Mach Learn. 2010;81(2):121–48.
Bhagoji AN, Chakraborty S, Mittal P, et al. Analyzing federated learning through an adversarial lens. In: International conference on machine learning, PMLR; 2019. p. 634–43.
Biggio B, Roli F. Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recogn. 2018;84:317–31.
Biggio B, Nelson B, Laskov P. Support vector machines under adversarial label noise. In: Asian conference on machine learning, PMLR; 2011. p. 97–112.
Biggio B, Nelson B, Laskov P. Poisoning attacks against support vector machines. In: Proceedings of the 29th international conference on international conference on machine learning. Omnipress, Madison, WI, USA, ICML’12; 2012. p. 1467–74.
Blanchard P, El Mhamdi EM, Guerraoui R, et al. Machine learning with adversaries: Byzantine tolerant gradient descent. Adv Neural Inf Process Syst. 2017;2017:30.
Bovenzi G, Foggia A, Santella S, et al. Data poisoning attacks against autoencoder-based anomaly detection models: a robustness analysis. In: ICC 2022-IEEE international conference on communications; 2022. p. 5427–32. https://doi.org/10.1109/ICC45855.2022.9838942.
Cao X, Fang M, Liu J, et al. Fltrust: Byzantine-robust federated learning via trust bootstrap**. In: 28th annual network and distributed system security symposium, NDSS 2021, virtually, February 21–25, 2021. The Internet Society. https://www.ndss-symposium.org/ndss-paper/fltrust-byzantine-robust-federated-learning-via-trust-bootstrap**/; 2021.
Chen M, Yang Z, Saad W, et al. A joint learning and communications framework for federated learning over wireless networks. IEEE Trans Wirel Commun. 2020;20(1):269–83.
Dalvi N, Domingos P, Sanghai S, et al. Adversarial classification. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining; 2004. p. 99–108.
Demontis A, Melis M, Pintor M, et al. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In: 28th {USENIX} security symposium ({USENIX} security 19); 2019. p. 321–38.
Ding H, Yang F, Huang J. Defending SVMS against poisoning attacks: the hardness and dbscan approach. In: de Campos C, Maathuis MH, etitors, Proceedings of the thirty-seventh conference on uncertainty in artificial intelligence, proceedings of machine learning research, PMLR, vol. 161; 2021. p. 268–78. https://proceedings.mlr.press/v161/ding21b.html.
Doku R, Rawat DB, Mitigating data poisoning attacks on a federated learning-edge computing network. In: 2021 IEEE 18th annual consumer communications and networking conference (CCNC). IEEE; 2021. p. 1–6.
Fang M, Cao X, Jia J, et al. Local model poisoning attacks to {Byzantine-Robust} federated learning. In: 29th USENIX security symposium (USENIX Security 20); 2020. p. 1605–22.
Faqeh R, Fetzer C, Hermanns H, et al. Towards dynamic dependable systems through evidence-based continuous certification. In: Margaria T, Steffen B, et al., editors. Leveraging applications of formal methods, verification and validation: engineering principles. Cham: Springer; 2020. p. 416–39.
Gehr T, Mirman M, Drachsler-Cohen D, et al., Ai2: safety and robustness certification of neural networks with abstract interpretation. In: 2018 IEEE symposium on security and privacy (SP). IEEE; 2018. p. 3–18.
Goldstein M, Dengel A. Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: poster and demo track; 2012. p. 59–63.
Hsu RH, Wang YC, Fan CI, et al. A privacy-preserving federated learning system for android malware detection based on edge computing. In: 2020 15th Asia joint conference on information security (AsiaJCIS). IEEE; 2020. p. 128–36.
Huang C, Huang J, Liu X. Cross-silo federated learning: challenges and opportunities. Preprint ar**v:2206.12949 [cs.LG]; 2022.
Huang Y, Chu L, Zhou Z, et al. Personalized cross-silo federated learning on non-iid data. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35(9); 2021. p. 7865–73. https://doi.org/10.1609/aaai.v35i9.16960. https://ojs.aaai.org/index.php/AAAI/article/view/16960.
Israt Jahan Mouri MAAMuhammad Ridowan. Towards poisoning of federated support vector machines with data poisoning attacks. In: Proceedings of the 13th international conference on cloud computing and services science-CLOSER, INSTICC. SciTePress; 2023. p. 24–33.
Jagielski M, Oprea A, Biggio B, et al. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In: 2018 IEEE symposium on security and privacy (SP). IEEE; 2018. p. 19–35.
Kabir T, Adnan MA. A scalable algorithm for multi-class support vector machine on geo-distributed datasets. In: 2019 IEEE international conference on big data (big data). IEEE; 2019. p. 637–42.
Karimireddy SP, Jaggi M, Kale S, et al. Breaking the centralized barrier for cross-device federated learning. Adv Neural Inf Process Syst. 2021;34:28663–76.
Konecnỳ J, McMahan HB, Ramage D, et al. Federated optimization: distributed machine learning for on-device intelligence. CoRR; 2016.
Krizhevsky A, Hinton G, et al. Learning multiple layers of features from tiny images; 2009.
Laishram R, Phoha VV. Curie: A method for protecting svm classifier from poisoning attack. Preprint ar**v:1606.01584; 2016.
LeCun Y. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/; 1998.
Li Z, Zhao Y, Botta N, et al. Copod: copula-based outlier detection. In: 2020 IEEE international conference on data mining (ICDM). IEEE; 2020. p. 1118–23.
Manoharan P, Walia R, Iwendi C, et al. SVM-based generative adverserial networks for federated learning and edge computing attack model and outpoising. Expert Syst. 2022. https://doi.org/10.1111/exsy.13072.
McMahan B, Moore E, Ramage D, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. In: Singh A, Zhu J (eds) Proceedings of the 20th international conference on artificial intelligence and statistics, proceedings of machine learning research, PMLR, vol. 54; 2017. p. 1273–82. https://proceedings.mlr.press/v54/mcmahan17a.html.
Mei S, Zhu X. Using machine teaching to identify optimal training-set attacks on machine learners. In: Twenty-ninth AAAI conference on artificial intelligence; 2015.
Melis M, Demontis A, Pintor M, et al. SECML: a Python library for secure and explainable machine learning. Preprint ar**v:1912.10013; 2019.
Muñoz-González L, Biggio B, Demontis A, et al. Towards poisoning of deep learning algorithms with back-gradient optimization. In: Proceedings of the 10th ACM workshop on artificial intelligence and security; 2017. p. 27–38.
Nair DG, Aswartha Narayana CV, Jaideep Reddy K, et al. Exploring SVM for federated machine learning applications. In: Rout RR, Ghosh SK, Jana PK, et al., editors. Advances in distributed computing and machine learning. Singapore: Springer; 2022. p. 295–305.
Navia-Vázquez A, Díaz-Morales R, Fernández-Díaz M. Budget distributed support vector machine for non-id federated learning scenarios. ACM Trans Intel Syst Technol (TIST). 2022;13(6):1–25.
Paudice A, Muñoz-González L, Gyorgy A, et al. Detection of adversarial training examples in poisoning attacks through anomaly detection. Preprint ar** poisoning attacks. In: Alzate C, Monreale A, Assem H, et al., editors. ECML PKDD 2018 workshops. Cham: Springer; 2019. p. 5–15.
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Peri N, Gupta N, Huang WR, et al. Deep k-nn defense against clean-label data poisoning attacks. In: Bartoli A, Fusiello A, editors., et al., Computer vision-ECCV 2020 workshops. Cham: Springer; 2020. p. 55–70.
Pitropakis N, Panaousis E, Giannetsos T, et al. A taxonomy and survey of attacks against machine learning. Comput Sci Rev. 2019;34: 100199. https://doi.org/10.1016/j.cosrev.2019.100199. www.sciencedirect.com/science/article/pii/S1574013718303289.
Prokhorov D. Ijcnn 2001 neural network competition. Slide Present IJCNN. 2001;1(97):38.
Radford BJ, Apolonio LM, Trias AJ, et al. Network traffic anomaly detection using recurrent neural networks. CoRR ar**v:1803.10769; 2018
Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data; 2000. p. 427–38.
Rehman MHu, Dirir AM, Salah K, et al. TrustFed: A framework for fair and trustworthy cross-device federated learning in IIoT. IEEE Trans Ind Inform 2021;17(12):8485–94. https://doi.org/10.1109/TII.2021.3075706.
Shejwalkar V, Houmansadr A, Kairouz P, et al. Back to the drawing board: a critical evaluation of poisoning attacks on production federated learning. In: IEEE symposium on security and privacy; 2022.
Steinhardt J, Koh PW, Liang P. Certified defenses for data poisoning attacks. In: Proceedings of the 31st international conference on neural information processing systems; 2017. p. 3520–32.
Sun G, Cong Y, Dong J, et al. Data poisoning attacks on federated machine learning. IEEE Int Things J. 2021;2021:1.
Tolpegin V, Truex S, Gursoy ME, et al. Data poisoning attacks against federated learning systems. In: European symposium on research in computer security. London: Springer; 2020. p. 480–501.
Wang S, Chen M, Saad W, et al. Federated learning for energy-efficient task computing in wireless networks. In: ICC 2020-2020 IEEE international conference on communications (ICC). IEEE; 2020. p. 1–6.
**ao H, Biggio B, Brown G, et al. Is feature selection secure against training data poisoning? In: International conference on machine learning, PMLR; 2015. p. 1689–98.
**ao H, Rasul K, Vollgraf R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Preprint ar**v:1708.07747; 2017.
Yin D, Chen Y, Kannan R, et al. Byzantine-robust distributed learning: Towards optimal statistical rates. In: International conference on machine learning, PMLR; 2018. p. 5650–9.
Zhang R, Zhu Q. A game-theoretic defense against data poisoning attacks in distributed support vector machines. In: 2017 IEEE 56th annual conference on decision and control (CDC). IEEE; 2017. p. 4582–7.
Zhao Y, Nasrullah Z, Li Z. PYOD: a Python toolbox for scalable outlier detection. J Mach Lear Res. 2019;20(96):1–7. http://jmlr.org/papers/v20/19-011.html.
Zhou Y, Kantarcioglu M, Thuraisingham B, et al. Adversarial support vector machine learning. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining; 2012. p. 1059–67.
Zhu Y, Cui L, Ding Z, et al. Black box attack and network intrusion detection using machine learning for malicious traffic. Comput Secur. 2022;123: 102922.
Acknowledgements
This research work is carried out under the RISE (Research and Innovation Centre for Science and Engineering) Internal Research Grant from Bangladesh University of Engineering and Technology (BUET).
Funding
This study has been carried out in the context of the project titled “Securing Federated Learning from Poisoning Attacks” with Application ID: 2022-01-019, awarded under the RISE (Research and Innovation Centre for Science and Engineering) Internal Research Grant (Call-ID: 2022-01).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all the authors, the corresponding author states that there is no conflict of interest.
Code Availability
Our codes are open source and available in this URL: https://github.com/pkse-searcher/fsvm-pois-attack-defense/.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Recent Trends on Cloud Computing and Services Science” guest edited by Claus Pahl and Maarten van Steen.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mouri, I.J., Ridowan, M. & Adnan, M.A. Data Poisoning Attacks and Mitigation Strategies on Federated Support Vector Machines. SN COMPUT. SCI. 5, 241 (2024). https://doi.org/10.1007/s42979-023-02556-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-023-02556-9