ARE-CAM: An Interpretable Approach to Quantitatively Evaluating the Adversarial Robustness of Deep Models Based on CAM

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14554))

Included in the following conference series:

  • 547 Accesses

Abstract

Evaluating the adversarial robustness of deep models is critical for training more robust models. However, few methods are both interpretable and quantifiable. Interpretable evaluation methods cannot quantify adversarial robustness, leading to unobjective evaluation results. On the other hand, quantifiable evaluation methods are often unexplainable, making it difficult for evaluators to trust and trace the results. To address this issue, an adversarial robustness evaluation approach based on class activation map** (ARE-CAM) is proposed. This approach utilizes CAM to generate heatmaps and visualize the areas of concern for the model. By comparing the difference between the original example and the adversarial example from the perspective of visual and statistical characteristics, the changes in the model after being attacked are observed, which enhances the interpretability of the evaluation. Additionally, four metrics are proposed to quantify adversarial robustness: the average coverage coincidence rate (ACCR), average high activation coincidence rate (AHCR), average heat area difference (AHAD) and average heat difference (AHD). Comprehensive experiments are conducted based on 14 deep models and different datasets to verify ARE-CAM’s efficiency. To the best of our knowledge, ARE-CAM is the first quantifiable and interpretable approach for evaluating adversarial robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 60.98
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 78.06
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://yann.lecun.com/exdb/mnist/.

  2. 2.

    http://www.cs.toronto.edu/~kriz/cifar.html.

References

  1. Bae, W., Noh, J., Kim, G.: Rethinking class activation map** for weakly supervised object localization. vol. 12360 LNCS, pp. 618–634. Glasgow, United kingdom (2020)

    Google Scholar 

  2. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. vol. 2018-January, pp. 839–847. Lake Tahoe, NV, United states (2018)

    Google Scholar 

  3. Chen, S.H., Shen, H.J., Wang, R., Wang, X.Z.: Relationship between prediction uncertainty and adversarial robustness. J. Softw. 33(2), 524–538 (2022)

    Google Scholar 

  4. Ding, G.W., Wang, L., **, X.: Advertorch v0. 1: an adversarial robustness toolbox based on pytorch. ar**v preprint ar**v:1902.07623 (2019)

  5. Dong, Y., Fu, et al.: Benchmarking adversarial robustness on image classification, pp. 318–328. Virtual, Online, United states (2020)

    Google Scholar 

  6. Guo, J., et al.: A comprehensive evaluation framework for deep model robustness: an evaluation framework for model robustness. Pattern Recogn. 137, 109308 (2023)

    Google Scholar 

  7. Šircelj, J., Skoaj, D.: Accuracy-perturbation curves for evaluation of adversarial attack and defence methods, pp. 6290–6297. Virtual, Milan, Italy (2020)

    Google Scholar 

  8. Ju, L., Cui, R., Sun, J., Li, Z.: A robust approach to adversarial attack on tabular data for classification algorithm testing, pp. 371–376. Guiyang, China (2022)

    Google Scholar 

  9. Li, Y., **, W., Xu, H., Tang, J.: Deeprobust: a pytorch library for adversarial attacks and defenses. ar**v preprint ar**v:2005.06149 (2020)

  10. Li, Z., Sun, J., Yang, K., **ong, D.: A review of adversarial robustness evaluation for image classification. J. Comput. Res. Develop. 59(10), 2164–2189 (2022)

    Google Scholar 

  11. Ling, X., et al.: Deepsec: A uniform platform for security analysis of deep learning model, vol. 2019-May, pp. 673–690. San Francisco, CA, United states (2019)

    Google Scholar 

  12. Luo, B., Liu, Y., Wei, L., Xu, Q.: Towards imperceptible and robust adversarial example attacks against neural networks, pp. 1652–1659. New Orleans, LA, United states (2018)

    Google Scholar 

  13. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks, vol. 2016-December, pp. 2574–2582. Las Vegas, NV, United states (2016)

    Google Scholar 

  14. Papernot, N., et al.: Technical report on the cleverhans v2. 1.0 adversarial examples library. ar**v preprint ar**v:1610.00768 (2016)

  15. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision 128(2), 336–359 (2020)

    Article  Google Scholar 

  16. Weng, T.W., Zhang, H., Chen, P.Y., Lozano, A., Hsieh, C.J., Daniel, L.: On extensions of clever: a neural network robustness evaluation algorithm, pp. 1159–1163. Anaheim, CA, United states (2018)

    Google Scholar 

  17. Weng, T.W., et al.: Evaluating the robustness of neural networks: an extreme value theory approach. Vancouver, BC, Canada (2018)

    Google Scholar 

  18. Zhang, C., et al.: Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Trans. Image Process. 30, 1291–1304 (2021)

    Article  Google Scholar 

  19. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. vol. 2016-December, pp. 2921–2929. Las Vegas, NV, United states (2016)

    Google Scholar 

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China under Grant No. 71901212 and No. 72071206, in part by Key Projects of the National Natural Science Foundation of China under Grant No. 72231011.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianbin Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Z., Sun, J., Qin, Y., Ju, L., Yang, K. (2024). ARE-CAM: An Interpretable Approach to Quantitatively Evaluating the Adversarial Robustness of Deep Models Based on CAM. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53305-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53304-4

  • Online ISBN: 978-3-031-53305-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation