Poly-cam: high resolution class activation map for convolutional neural networks

Englebert, Alexandre; Cornu, Olivier; Vleeschouwer, Christophe De

doi:10.1007/s00138-024-01567-7

Poly-cam: high resolution class activation map for convolutional neural networks

Research
Published: 03 July 2024

Volume 35, article number 89, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Alexandre Englebert^1,2,
Olivier Cornu^2,3 &
Christophe De Vleeschouwer¹

Abstract

The demand for explainable AI continues to rise alongside advancements in deep learning technology. Existing methods such as convolutional neural networks often struggle to accurately pinpoint the image features justifying a network’s prediction due to low-resolution saliency maps (e.g., CAM), smooth visualizations from perturbation-based techniques, or numerous isolated peaky spots in gradient-based approaches. In response, our work seeks to merge information from earlier and later layers within the network to create high-resolution class activation maps that not only maintain a level of competitiveness with previous art in terms of insertion-deletion faithfulness metrics but also significantly surpass it regarding the precision in localizing class-specific features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

This paper uses the ImageNet dataset https://image-net.org/, and the Stone Tiles dataset from Euresys (https://downloads.euresys.com/PackageFiles/OPENEVISION/24.02.0.27377WIN/976684208/Deep_Learning_Additional_Resources_24.02.0.27377.zip.)

Notes

It means 35 input perturbations (with a \(\sigma =2\) Gaussian noise) for SS-CAM, 50 input perturbations (with a \(\sigma =1\) Gaussian noise) for SmoothGrad, 10 interpolation steps for IS-CAM, threshold at 0.95 for FD-CAM and 50 for IntegratedGradient. For Layer-CAM, the layers corresponding to a change in resolution were used, and recommended scaling has been applied to the first two layers. For Zoom-CAM, all the layers/blocks were fused for VGG16 and ResNet50.
Downloaded from the Euresys website: https://www.euresys.com

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Article Google Scholar
Wang, W., Han, C., Zhou, T., Liu, D.: Visual recognition with deep nearest centroids (2022). ar**v preprint ar**v:2209.07383
Song, X., Wu, N., Song, S., Zhang, Y., Stojanovic, V.: Bipartite synchronization for cooperative-competitive neural networks with reaction-diffusion terms via dual event-triggered mechanism. Neurocomputing 550, 126498 (2023)
Article Google Scholar
Song, X., Sun, P., Song, S., Stojanovic, V.: Quantized neural adaptive finite-time preassigned performance control for interconnected nonlinear systems. Neural Comput. Appl. 35(21), 15429–15446 (2023)
Article Google Scholar
Song, X., Peng, Z., Song, S., Stojanovic, V.: Anti-disturbance state estimation for pdt-switched Rdnns utilizing time-sampling and space-splitting measurements. Commun. Nonlinear Sci. Numer. Simul. 132, 107945 (2024)
Article MathSciNet Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.-R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)
Article Google Scholar
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.-R.: Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1), 1–8 (2019)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. Springer (2014)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning. PMLR (2017)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise (2017). ar**v preprint ar**v:1706.03825
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE (2018)
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S., Mardziel, P., Hu, X.: Score-CAM: score-weighted visual explanations for convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Workshop on Fair, Data Efficient and Trusted Computer Vision (2020)
Shi, X., Khademi, S., Li, Y., Gemert, J.: Zoom-cam: generating fine-grained pixel annotations from image labels. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 10289–10296, IEEE (2021)
Jiang, P.-T., Zhang, C.-B., Hou, Q., Cheng, M.-M., Wei, Y.: LayerCAM: exploring hierarchical class activation maps for localization. IEEE Trans. Image Process. 30, 5875–5888 (2021)
Article Google Scholar
Englebert, A., Cornu, O., Vleeschouwer, C.: Backward recursive class activation map refinement for high resolution saliency map. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE (2021)
Stassin, S., Englebert, A., Albert, J., Nanfack, G., Versbraegen, N., Frénay, B., Peiffer, G., Doh, M., Riche, N., De Vleeschouwer, C.: An experimental investigation into the evaluation of explainability methods for computer vision. Communications in Computer and Information Science (2023)
Petsiuk, V., Das, A., Saenko, K.: RISE: Randomized Input Sampling for Explanation of Black-box Models. In: British Machine Vision Conference (BMVC) (2018). http://bmvc2018.org/contents/papers/1064.pdf
Wang, H., Naidu, R., Michael, J., Kundu, S.S.: SS-CAM: Smoothed score-CAM for sharper visual feature localization (2020). ar**v preprint ar**v:2006.14255
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). ar**v preprint ar**v:1409.0473
Yamauchi, T., Ishikawa, M.: Spatial sensitive GRAD-CAM: visual explanations for object detection by incorporating spatial sensitivity. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 256–260, IEEE (2022)
Naidu, R., Ghosh, A., Maurya, Y., Kundu, S.S., et al.: IS-CAM: integrated score-CAM for axiomatic-based explanations (2020). ar**v preprint ar**v:2010.03023
Ibrahim, R., Shafiq, M.O.: Augmented score-CAM: high resolution visual interpretations for deep neural networks. Knowl. Based Syst. 252, 109287 (2022)
Article Google Scholar
Ramaswamy, H.G., : Ablation-cam: visual explanations for deep convolutional network via gradient-free localization. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 983–991 (2020)
Li, H., Li, Z., Ma, R., Wu, T.: FD-CAM: improving faithfulness and discriminability of visual explanation for CNNs (2022). ar**. In: 2019 IEEE International Conference on Image Processing (ICIP). IEEE (2019)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer (2015)
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Collin, A.-S., De Vleeschouwer, C.: Improved anomaly detection by training an autoencoder with skip connections on images corrupted with stain-shaped noise. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 7915–7922, IEEE (2021)
Liu, D., Cui, Y., Yan, L., Mousas, C., Yang, B., Chen, Y.: Densernet: weakly supervised visual localization using multi-scale feature aggregation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6101–6109 (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). ar**v preprint ar**v:1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Omeiza, D., Speakman, S., Cintas, C., Weldermariam, K.: Smooth grad-cam++: an enhanced inference level visualization technique for deep convolutional neural network models (2019). ar**v preprint ar**v:1908.01224
Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences (2016). ar**v preprint ar**v:1605.01713
Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: a unified and generic model interpretability library for PyTorch (2020)
Fernandez, F.-G.: TorchCAM: Class Activation Explorer. GitHub, San Francisco (2020)
Google Scholar
Poursabzi-Sangdeh, F., Goldstein, D.G., Hofman, J.M., Wortman Vaughan, J.W., Wallach, H.: Manipulating and measuring model interpretability. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021)
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps (2018). ar**v preprint ar**v:1810.03292
Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Yeh, C.-K., Hsieh, C.-Y., Suggala, A., Inouye, D.I., Ravikumar, P.K.: On the (in) fidelity and sensitivity of explanations. Adv. Neural Inf. Process. Syst. 32, 10967–10978 (2019)
Google Scholar
Cheng, Z., Liang, J., Choi, H., Tao, G., Cao, Z., Liu, D., Zhang, X.: Physical attack on monocular depth estimation with optimal adversarial patches. In: European Conference on Computer Vision, pp. 514–532, Springer (2022)
Cheng, Z., Choi, H., Feng, S., Liang, J.C., Tao, G., Liu, D., Zuzak, M., Zhang, X.: Fusion is not enough: single modal attack on fusion models for 3d object detection. In: The 12th International Conference on Learning Representations (2023)

Download references

Acknowledgements

This work was performed in UCLouvain in Belgium. It was funded by the FNRS, including a FRIA funding to Alexandre Englebert. Computational resources have been provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL) and the Consortium des Équipements de Calcul Intensif en Fédération Wallonie Bruxelles (CÉCI) funded by the Fond de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under convention 2.5020.11 and by the Walloon Region We also want to thank the authors of Zoom-CAM for their kind help in using their method.

Funding

Fonds De La Recherche Scientifique–FNRS.

Author information

Authors and Affiliations

Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), UCLouvain, Louvain-la-Neuve, Belgium
Alexandre Englebert & Christophe De Vleeschouwer
Service de Chirurgie Orthopédique et Traumatologie, Cliniques Universitaires Saint-Luc UCL, Brussels, Belgium
Alexandre Englebert & Olivier Cornu
Neuro musculo skeletal Lab (NMSK), UCLouvain, Brussels, Belgium
Olivier Cornu

Authors

Alexandre Englebert
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Cornu
View author publications
You can also search for this author in PubMed Google Scholar
Christophe De Vleeschouwer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.E. made the code and run the experiments. A.E, O.C. and C.D. analysed the results. A.E. and C.D. wrote the manuscript. All the authors reviewed the manuscript.

Corresponding author

Correspondence to Alexandre Englebert.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Section title of first appendix

1.1 Visual comparison with previous work

Figure 10 display a visual comparison with all the compared methods.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Englebert, A., Cornu, O. & Vleeschouwer, C.D. Poly-cam: high resolution class activation map for convolutional neural networks. Machine Vision and Applications 35, 89 (2024). https://doi.org/10.1007/s00138-024-01567-7

Download citation

Received: 19 March 2024
Revised: 03 June 2024
Accepted: 07 June 2024
Published: 03 July 2024
DOI: https://doi.org/10.1007/s00138-024-01567-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Poly-cam: high resolution class activation map for convolutional neural networks

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Section title of first appendix

Section title of first appendix

1.1 Visual comparison with previous work

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation