Log in

Poly-cam: high resolution class activation map for convolutional neural networks

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data availability

This paper uses the ImageNet dataset https://image-net.org/, and the Stone Tiles dataset from Euresys (https://downloads.euresys.com/PackageFiles/OPENEVISION/24.02.0.27377WIN/976684208/Deep_Learning_Additional_Resources_24.02.0.27377.zip.)

Notes

  1. It means 35 input perturbations (with a \(\sigma =2\) Gaussian noise) for SS-CAM, 50 input perturbations (with a \(\sigma =1\) Gaussian noise) for SmoothGrad, 10 interpolation steps for IS-CAM, threshold at 0.95 for FD-CAM and 50 for IntegratedGradient. For Layer-CAM, the layers corresponding to a change in resolution were used, and recommended scaling has been applied to the first two layers. For Zoom-CAM, all the layers/blocks were fused for VGG16 and ResNet50.

  2. Downloaded from the Euresys website: https://www.euresys.com

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  2. Wang, W., Han, C., Zhou, T., Liu, D.: Visual recognition with deep nearest centroids (2022). ar**v preprint ar**v:2209.07383

  3. Song, X., Wu, N., Song, S., Zhang, Y., Stojanovic, V.: Bipartite synchronization for cooperative-competitive neural networks with reaction-diffusion terms via dual event-triggered mechanism. Neurocomputing 550, 126498 (2023)

    Article  Google Scholar 

  4. Song, X., Sun, P., Song, S., Stojanovic, V.: Quantized neural adaptive finite-time preassigned performance control for interconnected nonlinear systems. Neural Comput. Appl. 35(21), 15429–15446 (2023)

    Article  Google Scholar 

  5. Song, X., Peng, Z., Song, S., Stojanovic, V.: Anti-disturbance state estimation for pdt-switched Rdnns utilizing time-sampling and space-splitting measurements. Commun. Nonlinear Sci. Numer. Simul. 132, 107945 (2024)

    Article  MathSciNet  Google Scholar 

  6. Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.-R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)

    Article  Google Scholar 

  7. Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.-R.: Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1–8 (2019)

    Article  Google Scholar 

  8. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. Springer (2014)

  9. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning. PMLR (2017)

  10. Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise (2017). ar**v preprint ar**v:1706.03825

  11. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  12. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  13. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE (2018)

  14. Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., Hu, X.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Workshop on Fair, Data Efficient and Trusted Computer Vision (2020)

  15. Shi, X., Khademi, S., Li, Y., Gemert, J.: Zoom-cam: generating fine-grained pixel annotations from image labels. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10289–10296, IEEE (2021)

  16. Jiang, P.-T., Zhang, C.-B., Hou, Q., Cheng, M.-M., Wei, Y.: LayerCAM: exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 30, 5875–5888 (2021)

    Article  Google Scholar 

  17. Englebert, A., Cornu, O., Vleeschouwer, C.: Backward recursive class activation map refinement for high resolution saliency map. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2021)

  18. Stassin, S., Englebert, A., Albert, J., Nanfack, G., Versbraegen, N., Frénay, B., Peiffer, G., Doh, M., Riche, N., De Vleeschouwer, C.: An experimental investigation into the evaluation of explainability methods for computer vision. Communications in Computer and Information Science (2023)

  19. Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized Input Sampling for Explanation of Black-box Models. In: British Machine Vision Conference (BMVC) (2018). http://bmvc2018.org/contents/papers/1064.pdf

  20. Wang, H., Naidu, R., Michael, J., Kundu, S.S.: SS-CAM: Smoothed score-CAM for sharper visual feature localization (2020). ar**v preprint ar**v:2006.14255

  21. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). ar**v preprint ar**v:1409.0473

  22. Yamauchi, T., Ishikawa, M.: Spatial sensitive GRAD-CAM: visual explanations for object detection by incorporating spatial sensitivity. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 256–260, IEEE (2022)

  23. Naidu, R., Ghosh, A., Maurya, Y., Kundu, S.S., et al.: IS-CAM: integrated score-CAM for axiomatic-based explanations (2020). ar**v preprint ar**v:2010.03023

  24. Ibrahim, R., Shafiq, M.O.: Augmented score-CAM: high resolution visual interpretations for deep neural networks. Knowl. Based Syst. 252, 109287 (2022)

    Article  Google Scholar 

  25. Ramaswamy, H.G., : Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 983–991 (2020)

  26. Li, H., Li, Z., Ma, R., Wu, T.: FD-CAM: improving faithfulness and discriminability of visual explanation for CNNs (2022). ar**. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE (2019)

  27. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

  28. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  29. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)

  30. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer (2015)

  31. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  32. Collin, A.-S., De Vleeschouwer, C.: Improved anomaly detection by training an autoencoder with skip connections on images corrupted with stain-shaped noise. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7915–7922, IEEE (2021)

  33. Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6101–6109 (2021)

  34. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). ar**v preprint ar**v:1409.1556

  35. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  36. Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth grad-cam++: an enhanced inference level visualization technique for deep convolutional neural network models (2019). ar**v preprint ar**v:1908.01224

  37. Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences (2016). ar**v preprint ar**v:1605.01713

  38. Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: a unified and generic model interpretability library for PyTorch (2020)

  39. Fernandez, F.-G.: TorchCAM: Class Activation Explorer. GitHub, San Francisco (2020)

    Google Scholar 

  40. Poursabzi-Sangdeh, F., Goldstein, D.G., Hofman, J.M., Wortman Vaughan, J.W., Wallach, H.: Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021)

  41. Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps (2018). ar**v preprint ar**v:1810.03292

  42. Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

  43. Yeh, C.-K., Hsieh, C.-Y., Suggala, A., Inouye, D.I., Ravikumar, P.K.: On the (in) fidelity and sensitivity of explanations. Adv. Neural Inf. Process. Syst. 32, 10967–10978 (2019)

    Google Scholar 

  44. Cheng, Z., Liang, J., Choi, H., Tao, G., Cao, Z., Liu, D., Zhang, X.: Physical attack on monocular depth estimation with optimal adversarial patches. In: European Conference on Computer Vision, pp. 514–532, Springer (2022)

  45. Cheng, Z., Choi, H., Feng, S., Liang, J.C., Tao, G., Liu, D., Zuzak, M., Zhang, X.: Fusion is not enough: single modal attack on fusion models for 3d object detection. In: The 12th International Conference on Learning Representations (2023)

Download references

Acknowledgements

This work was performed in UCLouvain in Belgium. It was funded by the FNRS, including a FRIA funding to Alexandre Englebert. Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and by the Walloon Region We also want to thank the authors of Zoom-CAM for their kind help in using their method.

Funding

Fonds De La Recherche Scientifique–FNRS.

Author information

Authors and Affiliations

Authors

Contributions

A.E. made the code and run the experiments. A.E, O.C. and C.D. analysed the results. A.E. and C.D. wrote the manuscript. All the authors reviewed the manuscript.

Corresponding author

Correspondence to Alexandre Englebert.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Section title of first appendix

Section title of first appendix

1.1 Visual comparison with previous work

Figure 10 display a visual comparison with all the compared methods.

Fig. 10
figure 10

Visual comparison of methods. The compared methods are the three Poly-CAM variants proposed in this paper (PCAM\(^+\), PCAM\(^-\), PCAM\(^\pm \) and \(\emptyset \)PCAM), Zoom-CAM [15], Layer-CAM [16], Grad-CAM [12], Grad-CAM++ [13], Smooth Grad-CAM++ [37], Score-CAM [14], SS-CAM [20], IS-CAM [23], Input X Gradient [38], IntegratedGradient [9], SmoothGrad [10], Occlusion [8], RISE [19]

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Englebert, A., Cornu, O. & Vleeschouwer, C.D. Poly-cam: high resolution class activation map for convolutional neural networks. Machine Vision and Applications 35, 89 (2024). https://doi.org/10.1007/s00138-024-01567-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01567-7

Keywords

Navigation