Abstract
The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig4_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00138-024-01567-7/MediaObjects/138_2024_1567_Fig9_HTML.png)
Data availability
This paper uses the ImageNet dataset https://image-net.org/, and the Stone Tiles dataset from Euresys (https://downloads.euresys.com/PackageFiles/OPENEVISION/24.02.0.27377WIN/976684208/Deep_Learning_Additional_Resources_24.02.0.27377.zip.)
Notes
It means 35 input perturbations (with a \(\sigma =2\) Gaussian noise) for SS-CAM, 50 input perturbations (with a \(\sigma =1\) Gaussian noise) for SmoothGrad, 10 interpolation steps for IS-CAM, threshold at 0.95 for FD-CAM and 50 for IntegratedGradient. For Layer-CAM, the layers corresponding to a change in resolution were used, and recommended scaling has been applied to the first two layers. For Zoom-CAM, all the layers/blocks were fused for VGG16 and ResNet50.
Downloaded from the Euresys website: https://www.euresys.com
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Wang, W., Han, C., Zhou, T., Liu, D.: Visual recognition with deep nearest centroids (2022). ar**v preprint ar**v:2209.07383
Song, X., Wu, N., Song, S., Zhang, Y., Stojanovic, V.: Bipartite synchronization for cooperative-competitive neural networks with reaction-diffusion terms via dual event-triggered mechanism. Neurocomputing 550, 126498 (2023)
Song, X., Sun, P., Song, S., Stojanovic, V.: Quantized neural adaptive finite-time preassigned performance control for interconnected nonlinear systems. Neural Comput. Appl. 35(21), 15429–15446 (2023)
Song, X., Peng, Z., Song, S., Stojanovic, V.: Anti-disturbance state estimation for pdt-switched Rdnns utilizing time-sampling and space-splitting measurements. Commun. Nonlinear Sci. Numer. Simul. 132, 107945 (2024)
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.-R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.-R.: Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1–8 (2019)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. Springer (2014)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning. PMLR (2017)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise (2017). ar**v preprint ar**v:1706.03825
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE (2018)
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., Hu, X.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Workshop on Fair, Data Efficient and Trusted Computer Vision (2020)
Shi, X., Khademi, S., Li, Y., Gemert, J.: Zoom-cam: generating fine-grained pixel annotations from image labels. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10289–10296, IEEE (2021)
Jiang, P.-T., Zhang, C.-B., Hou, Q., Cheng, M.-M., Wei, Y.: LayerCAM: exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 30, 5875–5888 (2021)
Englebert, A., Cornu, O., Vleeschouwer, C.: Backward recursive class activation map refinement for high resolution saliency map. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2021)
Stassin, S., Englebert, A., Albert, J., Nanfack, G., Versbraegen, N., Frénay, B., Peiffer, G., Doh, M., Riche, N., De Vleeschouwer, C.: An experimental investigation into the evaluation of explainability methods for computer vision. Communications in Computer and Information Science (2023)
Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized Input Sampling for Explanation of Black-box Models. In: British Machine Vision Conference (BMVC) (2018). http://bmvc2018.org/contents/papers/1064.pdf
Wang, H., Naidu, R., Michael, J., Kundu, S.S.: SS-CAM: Smoothed score-CAM for sharper visual feature localization (2020). ar**v preprint ar**v:2006.14255
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). ar**v preprint ar**v:1409.0473
Yamauchi, T., Ishikawa, M.: Spatial sensitive GRAD-CAM: visual explanations for object detection by incorporating spatial sensitivity. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 256–260, IEEE (2022)
Naidu, R., Ghosh, A., Maurya, Y., Kundu, S.S., et al.: IS-CAM: integrated score-CAM for axiomatic-based explanations (2020). ar**v preprint ar**v:2010.03023
Ibrahim, R., Shafiq, M.O.: Augmented score-CAM: high resolution visual interpretations for deep neural networks. Knowl. Based Syst. 252, 109287 (2022)
Ramaswamy, H.G., : Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 983–991 (2020)
Li, H., Li, Z., Ma, R., Wu, T.: FD-CAM: improving faithfulness and discriminability of visual explanation for CNNs (2022). ar**. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE (2019)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer (2015)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Collin, A.-S., De Vleeschouwer, C.: Improved anomaly detection by training an autoencoder with skip connections on images corrupted with stain-shaped noise. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7915–7922, IEEE (2021)
Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6101–6109 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). ar**v preprint ar**v:1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth grad-cam++: an enhanced inference level visualization technique for deep convolutional neural network models (2019). ar**v preprint ar**v:1908.01224
Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences (2016). ar**v preprint ar**v:1605.01713
Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: a unified and generic model interpretability library for PyTorch (2020)
Fernandez, F.-G.: TorchCAM: Class Activation Explorer. GitHub, San Francisco (2020)
Poursabzi-Sangdeh, F., Goldstein, D.G., Hofman, J.M., Wortman Vaughan, J.W., Wallach, H.: Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps (2018). ar**v preprint ar**v:1810.03292
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Yeh, C.-K., Hsieh, C.-Y., Suggala, A., Inouye, D.I., Ravikumar, P.K.: On the (in) fidelity and sensitivity of explanations. Adv. Neural Inf. Process. Syst. 32, 10967–10978 (2019)
Cheng, Z., Liang, J., Choi, H., Tao, G., Cao, Z., Liu, D., Zhang, X.: Physical attack on monocular depth estimation with optimal adversarial patches. In: European Conference on Computer Vision, pp. 514–532, Springer (2022)
Cheng, Z., Choi, H., Feng, S., Liang, J.C., Tao, G., Liu, D., Zuzak, M., Zhang, X.: Fusion is not enough: single modal attack on fusion models for 3d object detection. In: The 12th International Conference on Learning Representations (2023)
Acknowledgements
This work was performed in UCLouvain in Belgium. It was funded by the FNRS, including a FRIA funding to Alexandre Englebert. Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and by the Walloon Region We also want to thank the authors of Zoom-CAM for their kind help in using their method.
Funding
Fonds De La Recherche Scientifique–FNRS.
Author information
Authors and Affiliations
Contributions
A.E. made the code and run the experiments. A.E, O.C. and C.D. analysed the results. A.E. and C.D. wrote the manuscript. All the authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Section title of first appendix
Section title of first appendix
1.1 Visual comparison with previous work
Figure 10 display a visual comparison with all the compared methods.
Visual comparison of methods. The compared methods are the three Poly-CAM variants proposed in this paper (PCAM\(^+\), PCAM\(^-\), PCAM\(^\pm \) and \(\emptyset \)PCAM), Zoom-CAM [15], Layer-CAM [16], Grad-CAM [12], Grad-CAM++ [13], Smooth Grad-CAM++ [37], Score-CAM [14], SS-CAM [20], IS-CAM [23], Input X Gradient [38], IntegratedGradient [9], SmoothGrad [10], Occlusion [8], RISE [19]
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Englebert, A., Cornu, O. & Vleeschouwer, C.D. Poly-cam: high resolution class activation map for convolutional neural networks. Machine Vision and Applications 35, 89 (2024). https://doi.org/10.1007/s00138-024-01567-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-024-01567-7