Log in

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Recently, RGB-D salient object detection (SOD) has aroused widespread research interest. Existing methods tend to treat equally features at different levels and lead to inadequate interaction with cross-level features. Furthermore, many methods rely on the stacking of convolution layers or the use of dilated convolutions to increase the receptive field to extract high-level semantic features. However, these approaches may not effectively obtain context information, resulting in the loss of semantic information. In this paper, we propose a novel multi-level feature and context information fusion network (MFCINet) for RGB-D SOD, which mainly includes a detail enhancement fusion module (DEFM), semantic enhancement fusion module (SEFM), and multi-scale receptive field enhancement module (MREM). Concretely, we first design a detail enhancement fusion module (DEFM) and a semantic enhancement fusion module (SEFM) by introducing a combination of dual attention mechanisms to better fuse the rich details in low-level features and the rich semantic information in high-level features, respectively. Subsequently, a multi-scale receptive field enhancement module (MREM) is deployed to obtain the rich context semantic information in the network with the help of the parallel operation of convolution cores and skip connections, which are input into the subsequent dense connection pyramid decoder for SOD. Experimental results on five common datasets show that our model outperforms the 17 state-of-the-art (SOTA) methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Tsai MF, Tseng HJ (2021) Enhancing the identification accuracy of deep learning object detection using natural language processing. J Supercomput 77:6676–6691

    Article  Google Scholar 

  2. Zhou Y, Zheng X, Ouyang W et al (2023) A strip dilated convolutional network for semantic segmentation. Neural Process Lett 55:4439–4459

    Article  Google Scholar 

  3. Yu Y, Li H, Shi H et al (2023) Question-guided feature pyramid network for medical visual question answering. Expert Syst Appl 214:119148

    Article  Google Scholar 

  4. Ma F, Shou MZ, Zhu L, et al (2022) Unified transformer tracker for object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8781–8790

  5. Jeevan G, Zacharias GC, Nair MS et al (2022) An empirical study of the impact of masks on face recognition. Pattern Recogn 122:108308

    Article  Google Scholar 

  6. Wei J, Wang S, Huang Q (2020) F\(^3\)net: fusion, feedback and focus for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12321–12328

  7. Wang B, Chen Q, Zhou M, et al (2020) Progressive feature polishing network for salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12128–12135

  8. Kong Y, Feng M, Li X et al (2021) Spatial context-aware network for salient object detection. Pattern Recogn 114:107867

    Article  Google Scholar 

  9. Wang G, Chen C, Fan DP, et al (2021) From semantic categories to fixations: A novel weakly-supervised visual-auditory saliency detection approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 15119–15128

  10. Sun Y, **a C, Gao X et al (2022) Aggregating dense and attentional multi-scale feature network for salient object detection. Digit Signal Process 130:103747

    Article  Google Scholar 

  11. Peng H, Li B, **ong W, et al (2014) RGBD salient object detection: a benchmark and algorithms. In: Proceedings of the European Conference on Computer Vision, pp 92–109

  12. Ju R, Liu Y, Ren T et al (2015) Depth-aware salient object detection using anisotropic center-surround difference. Signal Process: Image Commun 38:115–126

    Google Scholar 

  13. Feng D, Barnes N, You S, et al (2016) Local background enclosure for RGB-D salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2343–2350

  14. Huang Z, Chen HX, Zhou T et al (2021) Multi-level cross-modal interaction network for RGB-D salient object detection. Neurocomputing 452:200–211

    Article  Google Scholar 

  15. Zhang W, Ji GP, Wang Z, et al (2021) Depth quality-inspired feature manipulation for efficient RGB-D salient object detection. In: Proceedings of the ACM International Conference on Multimedia, pp 731–740

  16. **a C, Duan S, Ge B et al (2022) HDNet: multi-modality hierarchy-aware decision network for RGB-D salient object detection. IEEE Signal Process Lett 29:2577–2581

    Article  Google Scholar 

  17. Wang J, Chen S, Lv X et al (2022) Guided residual network for RGB-D salient object detection with efficient depth feature learning. Vis Comput 38(5):1803–1814

    Article  Google Scholar 

  18. Liang F, Duan L, Ma W et al (2021) Context-aware network for RGB-D salient object detection. Pattern Recogn 111:107630

    Article  Google Scholar 

  19. Wu J, Zhou W, Luo T et al (2021) Multiscale multilevel context and multimodal fusion for RGB-D salient object detection. Signal Process 178:107766

    Article  Google Scholar 

  20. Singh SK, Srivastava R (2022) CSA-net: deep cross-complementary self attention and modality-specific preservation for saliency detection. Neural Process Lett 54(6):5587–5613

    Article  Google Scholar 

  21. Zhai Y, Fan DP, Yang J et al (2021) Bifurcated backbone strategy for RGB-D salient object detection. IEEE Trans Image Process 30:8727–8742

    Article  Google Scholar 

  22. Wang X, Li S, Chen C et al (2020) Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Trans Image Process 30:458–471

    Article  Google Scholar 

  23. Chen S, Fu Y (2020) Progressively guided alternate refinement network for RGB-D salient object detection. In: Proceedings of the European Conference on Computer Vision, pp 520–538

  24. Piao Y, Ji W, Li J, et al (2019) Depth-induced multi-scale recurrent attention network for saliency detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 7254–7263

  25. Zhang M, Zhang Y, Piao Y, et al (2020) Feature reintegration over differential treatment: a top-down and adaptive fusion network for RGB-D salient object detection. In: Proceedings of the ACM International Conference on Multimedia, pp 4107–4115

  26. Zhao Z, Huang Z, Chai X et al (2023) Depth enhanced cross-modal cascaded network for RGB-D salient object detection. Neural Process Lett 55:361–384

    Article  Google Scholar 

  27. Borji A, Cheng MM, Hou Q et al (2019) Salient object detection: a survey. Comput Vis Media 5:117–150

    Article  Google Scholar 

  28. Zhou L, Yang Z, Zhou Z et al (2017) Salient region detection using diffusion process on a two-layer sparse graph. IEEE Trans Image Process 26(12):5882–5894

    Article  MathSciNet  Google Scholar 

  29. Zhu W, Liang S, Wei Y, et al (2014) Saliency optimization from robust background detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2814–2821

  30. Shen X, Wu Y (2012) A unified approach to salient object detection via low rank matrix recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 853–860

  31. Xu X, Chen J, Zhang H et al (2022) SA-DPNet: structure-aware dual pyramid network for salient object detection. Pattern Recogn 127:108624

    Article  Google Scholar 

  32. Niu Y, Geng Y, Li X, et al (2012) Leveraging stereopsis for saliency analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 454–461

  33. Zhou W, Zhu Y, Lei J et al (2021) Ccafnet: crossflow and cross-scale adaptive fusion network for detecting salient objects in RGB-D images. IEEE Trans Multimedia 24:2192–2204

    Article  Google Scholar 

  34. Li G, Liu Z, Ye L, et al (2020) Cross-modal weighting network for RGB-D salient object detection. In: Proceedings of the European Conference on Computer Vision, pp 665–681

  35. Zhao X, Zhang L, Pang Y, et al (2020) A single stream network for robust and real-time RGB-D salient object detection. In: Proceedings of the European Conference on Computer Vision, pp 646–662

  36. Luo W, Li Y, Urtasun R et al (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29:4898–4906

    Google Scholar 

  37. Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890

  38. Chen LC, Zhu Y, Papandreou G, et al (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision, pp 801–818

  39. Li C, Cong R, Piao Y, et al (2020a) RGB-D salient object detection with cross-modality modulation and selection. In: Proceedings of the European Conference on Computer Vision, pp 225–241

  40. Li G, Liu Z, Ling H (2020) ICNet: Information conversion network for RGB-D based salient object detection. IEEE Trans Image Process 29:4873–4884

    Article  Google Scholar 

  41. Ju R, Ge L, Geng W, et al (2014) Depth saliency based on anisotropic center-surround difference. In: Proceedings of the IEEE International Conference on Image Processing, pp 1115–1119

  42. Cheng Y, Fu H, Wei X, et al (2014) Depth enhanced saliency detection method. In: Proceedings of International Conference on Internet Multimedia Computing and Service, pp 23–27

  43. Zhang C, Cong R, Lin Q, et al (2021) Cross-modality discrepant interaction network for RGB-D salient object detection. In: Proceedings of the ACM International Conference on Multimedia, pp 2094–2102

  44. Borji A, Cheng MM, Jiang H et al (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12):5706–5722

    Article  MathSciNet  Google Scholar 

  45. Fan DP, Cheng MM, Liu Y, et al (2017) Structure-measure: a new way to evaluate foreground maps. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4548–4557

  46. Achanta R, Hemami S, Estrada F, et al (2009) Frequency-tuned salient region detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1597–1604

  47. Fan DP, Gong C, Cao Y, et al (2018) Enhanced-alignment measure for binary foreground map evaluation. ar**v preprint ar**v:1805.10421

  48. Han J, Chen H, Liu N et al (2017) CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Trans Cybern 48(11):3171–3183

    Article  Google Scholar 

  49. Zhao JX, Cao Y, Fan DP, et al (2019) Contrast prior and fluid pyramid integration for RGBD salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3927–3936

  50. Chen Z, Cong R, Xu Q et al (2020) Dpanet: depth potentiality-aware gated attention network for RGB-D salient object detection. IEEE Trans Image Process 30:7012–7024

    Article  Google Scholar 

  51. Fan DP, Lin Z, Zhang Z et al (2020) Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Trans Neural Netw Learn Syst 32(5):2075–2089

    Article  Google Scholar 

  52. Chen C, Wei J, Peng C et al (2021) Depth-quality-aware salient object detection. IEEE Trans Image Process 30:2350–2363

    Article  Google Scholar 

  53. Zhang Z, Lin Z, Xu J et al (2021) Bilateral attention network for RGB-D salient object detection. IEEE Trans Image Process 30:1949–1961

    Article  Google Scholar 

  54. Li C, Cong R, Kwong S et al (2020) ASIF-net: attention steered interweave fusion network for RGB-D salient object detection. IEEE Trans Cybern 51(1):88–100

    Article  Google Scholar 

  55. Zhu X, Li Y, Fu H et al (2021) RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss. Neurocomputing 453:623–635

    Article  Google Scholar 

  56. Wang F, Pan J, Xu S et al (2022) Learning discriminative cross-modality features for RGB-D saliency detection. IEEE Trans Image Process 31:1285–1297

    Article  Google Scholar 

  57. Zhang M, Ren W, Piao Y, et al (2020) Select, supplement and focus for RGB-D saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3472–3481

  58. Sun P, Zhang W, Wang H, et al (2021) Deep RGB-D saliency detection with depth-sensitive attention and automatic multi-modal fusion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1407–1417

  59. Zhao X, Pang Y, Zhang L, et al (2022) Self-supervised pretraining for RGB-D salient object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 3463–3471

  60. Liu N, Zhang N, Wan K, et al (2021) Visual saliency transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4722–4732

  61. Pang Y, Zhao X, Zhang L, et al (2021) Transcmd: cross-modal decoder equipped with transformer for RGB-D salient object detection. ar**v preprint ar**v:2112.02363

  62. Liu Z, Wang Y, Tu Z, et al (2021) Tritransnet: RGB-D salient object detection with a triplet transformer embedding network. In: Proceedings of the ACM International Conference on Multimedia, pp 4481–4490

Download references

Funding

This work was supported by the National Natural Science Foundation of China (62102003), Natural Science Foundation of Anhui Province (2108085QF258), Anhui Postdoctoral Science Foundation (2022B623), the University Synergy Innovation Program of Anhui Province (GXXT-2021-006, GXXT-2022-038), Central guiding local technology development special funds (202107d06020001), the Institute of Energy, Hefei Comprehensive National Science Center under (21KZS217), and University-Level General Projects of Anhui University of Science and Technology (xjyb2020-04).

Author information

Authors and Affiliations

Authors

Contributions

C. **a and D. Chen developed the overall research idea and wrote the main manuscript text, X. Gao collected the data and designed the experiments, X. Fang and B. Ge performed data analysis, Y. Zhang and Kuan-Ching Li edited the final version for clarity and accuracy, Ke Yang interpreted the data and provided critical insights into the findings.

Corresponding author

Correspondence to Difeng Chen.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

**a, C., Chen, D., Gao, X. et al. MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection. J Supercomput 80, 2487–2513 (2024). https://doi.org/10.1007/s11227-023-05561-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05561-0

Keywords

Navigation