Log in

A robot gras** detection network based on flexible selection of multi-modal feature fusion structure

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In unstructured scenarios, objects usually have unique shapes, poses, and other uncertainties, which put forward higher requirements for the robot’s planar gras** detection ability. Most previous methods use single-modal data or simply fused multi-modal data to predict gripper configurations. Single-modal data is not conducive to comprehensively describe the diversity of objects, and the simple fusion method may also ignore the dependencies between multi-modal data. Based on the above considerations, we propose a Multi-modal Dynamic Cooperative Fusion Network (MDCNet), in which a Multilevel Semantic Guided Fusion Module (MSG) is designed, through which enhanced semantic guidance vectors are used to suppress the undesired influence factors produced by different fusion structures. In addition, we also design a general Enhanced Feature Pyramid Nets Structure (EFPN) to learn the dependencies between fine-grained features and coarse-grained features and improve the robustness of the encoder in unstructured scenarios. The results show that the proposed method has an accuracy rate of 98.9% on the Jacquard dataset and 99.6% on the Cornell dataset. In over 2000 robotic grasp trials, our structure achieves a grasp success rate of 98.8% in single-object scenarios and 93.5% in cluttered scenarios. The proposed method in this paper is superior to previous grasp detection methods in both speed and accuracy, and has strong real-time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The data used in this paper are all from public datasets.

References

  1. Hu Y, Wu X, Geng P, Li Z (2019) Evolution strategies learning with variable impedance control for gras** under uncertainty. IEEE Trans Industr Electron 66(10):7788–7799

    Article  Google Scholar 

  2. Li G, Li N, Chang F, Liu C (2022) Adaptive graph convolutional network with adversarial learning for skeleton-based action prediction. IEEE Transactions on Cognitive and Developmental Systems 14(3):1258–1269

    Article  Google Scholar 

  3. Buzzatto J, Chapman J, Shahmohammadi M, Sanches F, Nejati M, Matsunaga S, Haraguchi R, Mariyama T, MacDonald B, Liarokapis M (2022) On robotic manipulation of flexible flat cables: Employing a multi-modal gripper with dexterous tips, active nails, and a reconfigurable suction cup module. 2022 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1602–1608

  4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444

    Google Scholar 

  5. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  6. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  7. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708

  8. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inform Process Syst 27

  9. Socher R, Huang E, Pennin J, Manning CD, Ng A (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv Neural Inform Process Syst 24

  10. Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3813–3822

  11. Corsaro M, Tellex S, Konidaris G (2021) Learning to detect multi-modal grasps for dexterous gras** in dense clutter. 2021 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4647–4653

  12. Prabhu T, Manivannan P, Roy D et al (2021) A robust tactile sensor matrix for intelligent gras** of objects using robotic grippers. 2021 International symposium of asian control association on intelligent robotics and industrial automation (IRIA), pp 400–405

  13. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  14. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  15. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, pp 21–37

  16. Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352

  17. Zhang J, Liu H, Yang K, Hu X, Liu R, Stiefelhagen R (2023) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. IEEE Transactions on intelligent transportation systems, pp 1–16

  18. Yan R, Yang K, Wang K (2021) Nlfnet: non-local fusion towards generalized multimodal semantic segmentation across rgb-depth, polarization, and thermal images. 2021 IEEE International conference on robotics and biomimetics (ROBIO), pp 1129–1135

  19. Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for rgb-d semantic segmentation. 2021 IEEE International conference on multimedia and expo (ICME), pp 1–6

  20. Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks. 2015 IEEE International conference on robotics and automation (ICRA), pp 1316–1322

  21. Jiang Y, Moseson S, Saxena A (2011) Efficient gras** from rgbd images: Learning using a new rectangle representation. 2011 IEEE International conference on robotics and automation, pp 3304–3311

  22. Nishi T, Shinya Kawasaki KI (2023) M3r-cnn: on effective multi-modal fusion of rgb and depth cues for instance segmentation in bin-picking. Adv Robot 37(18):1143–1157

  23. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

  24. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  25. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), pp 3–19

  26. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154

  27. Luo J, Lu J, Yue G (2021) Seatbelt detection in road surveillance images based on improved dense residual network with two-level attention mechanism. J Electron Imaging 30(3):033–036

    Article  Google Scholar 

  28. Chu F-J, Xu R, Vela PA (2018) Real-world multiobject, multigrasp detection. IEEE Robotics and Automation Letters 3(4):3355–3362

    Article  Google Scholar 

  29. Karaoguz H, Jensfelt P (2019) Object detection approach for robot grasp detection. 2019 International conference on robotics and automation (ICRA), pp 4953–4959

  30. Xu Y, Wang L, Yang A, Chen L (2019) Graspcnn: Real-time grasp detection using a new oriented diameter circle representation. IEEE Access 7:159322–159331

    Article  Google Scholar 

  31. Dong Z, Tian H, Bao X, Yan Y, Chen F (2022) Graspvdn: scene-oriented grasp estimation by learning vector representations of grasps. Complex & Intelligent Systems 8(4):2911–2922

    Article  Google Scholar 

  32. Chen L, Huang P, Li Y, Meng Z (2020) Detecting graspable rectangles of objects in robotic gras**. Int J Control Autom Syst 18(5):1343–1352

    Article  Google Scholar 

  33. Yang L, Cui G, Chen S, Zhu X (2021) Research on robot classifiable grasp detection method based on convolutional neural network. International conference on intelligent robotics and applications, pp 705–715

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  35. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

  36. Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  37. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  38. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  39. Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724

  40. Zhang H, Zhou X, Lan X, Li J, Tian Z, Zheng N (2019) A real-time robotic gras** approach with oriented anchor box. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(5):3014–3025

    Article  Google Scholar 

  41. Yu S, Zhai D-H, **a Y (2023) Skgnet: Robotic grasp detection with selective kernel convolution. IEEE Trans Autom Sci Eng 20(4):2241–2252

    Article  Google Scholar 

  42. Asif U, Bennamoun M, Sohel FA (2017) Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Rob 33(3):547–564

    Article  Google Scholar 

  43. Morrison D, Corke P, Leitner J (2020) Learning robust, real-time, reactive robotic gras**. The International journal of robotics research 39(2–3):183–201

  44. Zhihong C, Hebin Z, Yanbo W, Binyan L, Yu L (2017) A vision-based robotic gras** system using deep learning for garbage sorting. 2017 36th Chinese control conference (CCC), pp 11223–11226

  45. Wang J, Yin H, Zhang S, Gui P, Xu K (2019) Accurate rapid gras** of small industrial parts from charging tray in clutter scenes. Sensors and Materials 31(6):2089–2101

    Article  Google Scholar 

  46. Kwiatkowski J, Cockburn D, Duchaine V (2017) Grasp stability assessment through the fusion of proprioception and tactile signals using convolutional neural networks. 2017 IEEE/RSJ Int Conf Int Robot Syst (IROS), pp 286–292

  47. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722

  48. Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI conference on artificial intelligence 34(07):12993–13000

    Article  Google Scholar 

  49. Depierre A, Dellandréa E, Chen L (2018) Jacquard: A large scale dataset for robotic grasp detection. 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 3511–3516

  50. Kumra S, Kanan C (2017) Robotic grasp detection using deep convolutional neural networks. 2017 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 769–776

  51. Song Y, Gao L, Li X, Shen W (2020) A novel robotic grasp detection method based on region proposal networks. Robotics and Computer-Integrated Manufacturing 65:101963

    Article  Google Scholar 

  52. Ainetter S, Fraundorfer F (2021) End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb. 2021 IEEE International conference on robotics and automation (ICRA), pp 13452–13458

  53. Kumra S, Joshi S, Sahin F (2020) Antipodal robotic gras** using generative residual convolutional neural network. 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 9626–9633

  54. Wang S, Zhou Z, Kan Z (2022) When transformer meets robotic gras**: Exploits context for efficient grasp detection. IEEE Robotics and Automation Letters 7(3):8170–8177

    Article  Google Scholar 

  55. Yu S, Zhai D-H, **a Y, Wu H, Liao J (2022) Se-resunet: A novel robotic grasp detection method. IEEE Robotics and Automation Letters 7(2):5238–5245

    Article  Google Scholar 

  56. Tian H, Song K, Li S, Ma S, Yan Y (2022) Lightweight pixel-wise generative robot gras** detection based on rgb-d dense fusion. IEEE Trans Instrum Meas 71:1–12

    Google Scholar 

  57. Wu Y, Fu Y, Wang S (2024) Information-theoretic exploration for adaptive robotic gras** in clutter based on real-time pixel-level grasp detection. IEEE Trans Industr Electron 71(3):2683–2693

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Jiangsu Agricultural Science and Technology Independent Innovation Fund (CN) under Grant CX(22)3045,and in part by the Teaching Reform Research Project of Yangzhou University under Grant xkjs2022018,and in part by the Medical Innovation Transformation Special Fund of Yangzhou University-New Medical Interdisciplinary Innovation Team under Grant AHYZUCXTD202106.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhibo Guo.

Ethics declarations

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Guo, Z., Chen, Y. et al. A robot gras** detection network based on flexible selection of multi-modal feature fusion structure. Appl Intell 54, 5044–5061 (2024). https://doi.org/10.1007/s10489-024-05427-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05427-9

Keywords

Navigation