Rule-Based Runtime Mitigation Against Poison Attacks on Neural Networks

  • Conference paper
  • First Online:
Runtime Verification (RV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13498))

Included in the following conference series:

  • 486 Accesses

Abstract

Poisoning or backdoor attacks are well-known attacks on image classification neural networks, whereby an attacker inserts a trigger into a subset of the training data, in such a way that the network learns to mis-classify any input with the trigger to a specific target label. We propose a set of runtime mitigation techniques, embodied by the tool AntidoteRT, which employs rules in terms of neuron patterns to detect and correct network behavior on poisoned inputs. The neuron patterns for correct and incorrect classifications are mined from the network based on running it on a clean and an optional set of poisoned samples with known ground-truth labels. AntidoteRT offers two methods for runtime correction: (i) pattern-based correction which employs patterns as oracles to estimate the ideal label, and (ii) input-based correction which corrects the input image by localizing the trigger and resetting it to a neutral color. We demonstrate that our techniques outperform existing defenses such as NeuralCleanse and STRIP on popular benchmarks such as MNIST, CIFAR-10, and GTSRB against the popular BadNets attack and the more complex DFST attack.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 47.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 59.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Change history

  • 23 September 2022

    In an older version of this paper, there was error in the figure 3, (e) and (f) was incorrect. This has been corrected.

Notes

  1. 1.

    Code/data is available at https://github.com/muhammadusman93/AntidoteRT.

References

  1. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: WACV, pp. 839–847. IEEE (2018)

    Google Scholar 

  2. Chen, B., et al.: Detecting backdoor attacks on deep neural networks by activation clustering. In: SafeAI@ AAAI (2019)

    Google Scholar 

  3. Chen, H., Fu, C., Zhao, J., Koushanfar, F.: DeepInspect: a black-box trojan detection and mitigation framework for deep neural networks. In: IJCAI, pp. 4658–4664 (2019)

    Google Scholar 

  4. Cheng, S., Liu, Y., Ma, S., Zhang, X.: Deep feature space trojan attack of neural networks by controlled detoxification. In: AAAI, vol. 35, pp. 1148–1156 (2021)

    Google Scholar 

  5. Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: STRIP: a defense against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125 (2019)

    Google Scholar 

  6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    Google Scholar 

  7. Gopinath, D., Converse, H., Pasareanu, C., Taly, A.: Property inference for deep neural networks. In: International Conference on Automated Software Engineering (ASE), pp. 797–809. IEEE (2019)

    Google Scholar 

  8. Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)

    Article  Google Scholar 

  9. Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: International Joint Conference on Neural Networks, no. 1288 (2013)

    Google Scholar 

  10. Huang, X., et al.: A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Katz, G., et al.: The Marabou framework for verification and analysis of deep neural networks. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26

    Chapter  Google Scholar 

  12. Kolouri, S., Saha, A., Pirsiavash, H., Hoffmann, H.: Universal litmus patterns: revealing backdoor attacks in CNNs. In: CVPR, pp. 301–310 (2020)

    Google Scholar 

  13. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Li, Y., Lyu, X., Koren, N., Lyu, L., Li, B., Ma, X.: Neural attention distillation: erasing backdoor triggers from deep neural networks. In: International Conference on Learning Representations (2020)

    Google Scholar 

  16. Li, Y., Zhai, T., Wu, B., Jiang, Y., Li, Z., **a, S.: Rethinking the trigger of backdoor attack. ar**v preprint ar**v:2004.04692 (2020)

  17. Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13

    Chapter  Google Scholar 

  18. Liu, X., Li, F., Wen, B., Li, Q.: Removing backdoor-based watermarks in neural networks with limited data. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10149–10156. IEEE (2021)

    Google Scholar 

  19. Liu, Y., et al.: Trojaning attack on neural networks. In: 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society (2018)

    Google Scholar 

  20. Liu, Y., Ma, X., Bailey, J., Lu, F.: Reflection backdoor: a natural backdoor attack on deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 182–199. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_11

    Chapter  Google Scholar 

  21. Liu, Y., **e, Y., Srivastava, A.: Neural trojans. In: International Conference on Computer Design (ICCD), pp. 45–48. IEEE (2017)

    Google Scholar 

  22. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)

    Google Scholar 

  23. Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 3520–3532 (2017)

    Google Scholar 

  24. Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, no. 31 (2018)

    Google Scholar 

  25. Turner, A., Tsipras, D., Madry, A.: Clean-label backdoor attacks (2018)

    Google Scholar 

  26. Udeshi, S., Peng, S., Woo, G., Loh, L., Rawshan, L., Chattopadhyay, S.: Model agnostic defence against backdoor attacks in machine learning. ar**v preprint ar**v:1908.02203 (2019)

  27. Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: S &P, pp. 707–723. IEEE (2019)

    Google Scholar 

  28. Wang, R., Zhang, G., Liu, S., Chen, P.-Y., **ong, J., Wang, M.: Practical detection of trojan neural networks: data-limited and data-free cases. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 222–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_14

    Chapter  Google Scholar 

  29. Xu, X., Wang, Q., Li, H., Borisov, N., Gunter, C.A., Li, B.: Detecting AI trojans using meta neural analysis. In: S &P, pp. 103–120. IEEE (2021)

    Google Scholar 

  30. Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2041–2055 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Divya Gopinath .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Usman, M., Gopinath, D., Sun, Y., Păsăreanu, C.S. (2022). Rule-Based Runtime Mitigation Against Poison Attacks on Neural Networks. In: Dang, T., Stolz, V. (eds) Runtime Verification. RV 2022. Lecture Notes in Computer Science, vol 13498. Springer, Cham. https://doi.org/10.1007/978-3-031-17196-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-17196-3_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-17195-6

  • Online ISBN: 978-3-031-17196-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation