Abstract
Evaluating the adversarial robustness of deep models is critical for training more robust models. However, few methods are both interpretable and quantifiable. Interpretable evaluation methods cannot quantify adversarial robustness, leading to unobjective evaluation results. On the other hand, quantifiable evaluation methods are often unexplainable, making it difficult for evaluators to trust and trace the results. To address this issue, an adversarial robustness evaluation approach based on class activation map** (ARE-CAM) is proposed. This approach utilizes CAM to generate heatmaps and visualize the areas of concern for the model. By comparing the difference between the original example and the adversarial example from the perspective of visual and statistical characteristics, the changes in the model after being attacked are observed, which enhances the interpretability of the evaluation. Additionally, four metrics are proposed to quantify adversarial robustness: the average coverage coincidence rate (ACCR), average high activation coincidence rate (AHCR), average heat area difference (AHAD) and average heat difference (AHD). Comprehensive experiments are conducted based on 14 deep models and different datasets to verify ARE-CAM’s efficiency. To the best of our knowledge, ARE-CAM is the first quantifiable and interpretable approach for evaluating adversarial robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bae, W., Noh, J., Kim, G.: Rethinking class activation map** for weakly supervised object localization. vol. 12360 LNCS, pp. 618–634. Glasgow, United kingdom (2020)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. vol. 2018-January, pp. 839–847. Lake Tahoe, NV, United states (2018)
Chen, S.H., Shen, H.J., Wang, R., Wang, X.Z.: Relationship between prediction uncertainty and adversarial robustness. J. Softw. 33(2), 524–538 (2022)
Ding, G.W., Wang, L., **, X.: Advertorch v0. 1: an adversarial robustness toolbox based on pytorch. ar**v preprint ar**v:1902.07623 (2019)
Dong, Y., Fu, et al.: Benchmarking adversarial robustness on image classification, pp. 318–328. Virtual, Online, United states (2020)
Guo, J., et al.: A comprehensive evaluation framework for deep model robustness: an evaluation framework for model robustness. Pattern Recogn. 137, 109308 (2023)
Šircelj, J., Skoaj, D.: Accuracy-perturbation curves for evaluation of adversarial attack and defence methods, pp. 6290–6297. Virtual, Milan, Italy (2020)
Ju, L., Cui, R., Sun, J., Li, Z.: A robust approach to adversarial attack on tabular data for classification algorithm testing, pp. 371–376. Guiyang, China (2022)
Li, Y., **, W., Xu, H., Tang, J.: Deeprobust: a pytorch library for adversarial attacks and defenses. ar**v preprint ar**v:2005.06149 (2020)
Li, Z., Sun, J., Yang, K., **ong, D.: A review of adversarial robustness evaluation for image classification. J. Comput. Res. Develop. 59(10), 2164–2189 (2022)
Ling, X., et al.: Deepsec: A uniform platform for security analysis of deep learning model, vol. 2019-May, pp. 673–690. San Francisco, CA, United states (2019)
Luo, B., Liu, Y., Wei, L., Xu, Q.: Towards imperceptible and robust adversarial example attacks against neural networks, pp. 1652–1659. New Orleans, LA, United states (2018)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks, vol. 2016-December, pp. 2574–2582. Las Vegas, NV, United states (2016)
Papernot, N., et al.: Technical report on the cleverhans v2. 1.0 adversarial examples library. ar**v preprint ar**v:1610.00768 (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vision 128(2), 336–359 (2020)
Weng, T.W., Zhang, H., Chen, P.Y., Lozano, A., Hsieh, C.J., Daniel, L.: On extensions of clever: a neural network robustness evaluation algorithm, pp. 1159–1163. Anaheim, CA, United states (2018)
Weng, T.W., et al.: Evaluating the robustness of neural networks: an extreme value theory approach. Vancouver, BC, Canada (2018)
Zhang, C., et al.: Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Trans. Image Process. 30, 1291–1304 (2021)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. vol. 2016-December, pp. 2921–2929. Las Vegas, NV, United states (2016)
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grant No. 71901212 and No. 72071206, in part by Key Projects of the National Natural Science Foundation of China under Grant No. 72231011.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Z., Sun, J., Qin, Y., Ju, L., Yang, K. (2024). ARE-CAM: An Interpretable Approach to Quantitatively Evaluating the Adversarial Robustness of Deep Models Based on CAM. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14554. Springer, Cham. https://doi.org/10.1007/978-3-031-53305-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-53305-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53304-4
Online ISBN: 978-3-031-53305-1
eBook Packages: Computer ScienceComputer Science (R0)