A robot gras** detection network based on flexible selection of multi-modal feature fusion structure

Wang, Yuhan; Guo, Zhibo; Chen, Yu; Guo, Chaiqi; **a, Meizhen; Qi, Tingyue

doi:10.1007/s10489-024-05427-9

A robot gras** detection network based on flexible selection of multi-modal feature fusion structure

Published: 13 April 2024

Volume 54, pages 5044–5061, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yuhan Wang¹,
Zhibo Guo¹^na1,
Yu Chen¹^na1,
Chaiqi Guo¹^na1,
Meizhen **a¹^na1 &
…
Tingyue Qi²^na1

213 Accesses
Explore all metrics

Abstract

In unstructured scenarios, objects usually have unique shapes, poses, and other uncertainties, which put forward higher requirements for the robot’s planar gras** detection ability. Most previous methods use single-modal data or simply fused multi-modal data to predict gripper configurations. Single-modal data is not conducive to comprehensively describe the diversity of objects, and the simple fusion method may also ignore the dependencies between multi-modal data. Based on the above considerations, we propose a Multi-modal Dynamic Cooperative Fusion Network (MDCNet), in which a Multilevel Semantic Guided Fusion Module (MSG) is designed, through which enhanced semantic guidance vectors are used to suppress the undesired influence factors produced by different fusion structures. In addition, we also design a general Enhanced Feature Pyramid Nets Structure (EFPN) to learn the dependencies between fine-grained features and coarse-grained features and improve the robustness of the encoder in unstructured scenarios. The results show that the proposed method has an accuracy rate of 98.9% on the Jacquard dataset and 99.6% on the Cornell dataset. In over 2000 robotic grasp trials, our structure achieves a grasp success rate of 98.8% in single-object scenarios and 93.5% in cluttered scenarios. The proposed method in this paper is superior to previous grasp detection methods in both speed and accuracy, and has strong real-time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review on cooperative dual-arm manipulators: modeling, planning, control, and vision strategies

Article 14 July 2023

Vision-based robotic gras** from object localization, object pose estimation to grasp estimation for parallel grippers: a review

Article 17 August 2020

A Survey on Learning-Based Robotic Gras**

Article Open access 20 September 2020

Data Availability

The data used in this paper are all from public datasets.

References

Hu Y, Wu X, Geng P, Li Z (2019) Evolution strategies learning with variable impedance control for gras** under uncertainty. IEEE Trans Industr Electron 66(10):7788–7799
Article Google Scholar
Li G, Li N, Chang F, Liu C (2022) Adaptive graph convolutional network with adversarial learning for skeleton-based action prediction. IEEE Transactions on Cognitive and Developmental Systems 14(3):1258–1269
Article Google Scholar
Buzzatto J, Chapman J, Shahmohammadi M, Sanches F, Nejati M, Matsunaga S, Haraguchi R, Mariyama T, MacDonald B, Liarokapis M (2022) On robotic manipulation of flexible flat cables: Employing a multi-modal gripper with dexterous tips, active nails, and a reconfigurable suction cup module. 2022 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1602–1608
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inform Process Syst 27
Socher R, Huang E, Pennin J, Manning CD, Ng A (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv Neural Inform Process Syst 24
Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3813–3822
Corsaro M, Tellex S, Konidaris G (2021) Learning to detect multi-modal grasps for dexterous gras** in dense clutter. 2021 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4647–4653
Prabhu T, Manivannan P, Roy D et al (2021) A robust tactile sensor matrix for intelligent gras** of objects using robotic grippers. 2021 International symposium of asian control association on intelligent robotics and industrial automation (IRIA), pp 400–405
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, pp 21–37
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352
Zhang J, Liu H, Yang K, Hu X, Liu R, Stiefelhagen R (2023) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. IEEE Transactions on intelligent transportation systems, pp 1–16
Yan R, Yang K, Wang K (2021) Nlfnet: non-local fusion towards generalized multimodal semantic segmentation across rgb-depth, polarization, and thermal images. 2021 IEEE International conference on robotics and biomimetics (ROBIO), pp 1129–1135
Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for rgb-d semantic segmentation. 2021 IEEE International conference on multimedia and expo (ICME), pp 1–6
Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks. 2015 IEEE International conference on robotics and automation (ICRA), pp 1316–1322
Jiang Y, Moseson S, Saxena A (2011) Efficient gras** from rgbd images: Learning using a new rectangle representation. 2011 IEEE International conference on robotics and automation, pp 3304–3311
Nishi T, Shinya Kawasaki KI (2023) M3r-cnn: on effective multi-modal fusion of rgb and depth cues for instance segmentation in bin-picking. Adv Robot 37(18):1143–1157
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), pp 3–19
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Luo J, Lu J, Yue G (2021) Seatbelt detection in road surveillance images based on improved dense residual network with two-level attention mechanism. J Electron Imaging 30(3):033–036
Article Google Scholar
Chu F-J, Xu R, Vela PA (2018) Real-world multiobject, multigrasp detection. IEEE Robotics and Automation Letters 3(4):3355–3362
Article Google Scholar
Karaoguz H, Jensfelt P (2019) Object detection approach for robot grasp detection. 2019 International conference on robotics and automation (ICRA), pp 4953–4959
Xu Y, Wang L, Yang A, Chen L (2019) Graspcnn: Real-time grasp detection using a new oriented diameter circle representation. IEEE Access 7:159322–159331
Article Google Scholar
Dong Z, Tian H, Bao X, Yan Y, Chen F (2022) Graspvdn: scene-oriented grasp estimation by learning vector representations of grasps. Complex & Intelligent Systems 8(4):2911–2922
Article Google Scholar
Chen L, Huang P, Li Y, Meng Z (2020) Detecting graspable rectangles of objects in robotic gras**. Int J Control Autom Syst 18(5):1343–1352
Article Google Scholar
Yang L, Cui G, Chen S, Zhu X (2021) Research on robot classifiable grasp detection method based on convolutional neural network. International conference on intelligent robotics and applications, pp 705–715
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724
Zhang H, Zhou X, Lan X, Li J, Tian Z, Zheng N (2019) A real-time robotic gras** approach with oriented anchor box. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(5):3014–3025
Article Google Scholar
Yu S, Zhai D-H, **a Y (2023) Skgnet: Robotic grasp detection with selective kernel convolution. IEEE Trans Autom Sci Eng 20(4):2241–2252
Article Google Scholar
Asif U, Bennamoun M, Sohel FA (2017) Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Rob 33(3):547–564
Article Google Scholar
Morrison D, Corke P, Leitner J (2020) Learning robust, real-time, reactive robotic gras**. The International journal of robotics research 39(2–3):183–201
Zhihong C, Hebin Z, Yanbo W, Binyan L, Yu L (2017) A vision-based robotic gras** system using deep learning for garbage sorting. 2017 36th Chinese control conference (CCC), pp 11223–11226
Wang J, Yin H, Zhang S, Gui P, Xu K (2019) Accurate rapid gras** of small industrial parts from charging tray in clutter scenes. Sensors and Materials 31(6):2089–2101
Article Google Scholar
Kwiatkowski J, Cockburn D, Duchaine V (2017) Grasp stability assessment through the fusion of proprioception and tactile signals using convolutional neural networks. 2017 IEEE/RSJ Int Conf Int Robot Syst (IROS), pp 286–292
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI conference on artificial intelligence 34(07):12993–13000
Article Google Scholar
Depierre A, Dellandréa E, Chen L (2018) Jacquard: A large scale dataset for robotic grasp detection. 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 3511–3516
Kumra S, Kanan C (2017) Robotic grasp detection using deep convolutional neural networks. 2017 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 769–776
Song Y, Gao L, Li X, Shen W (2020) A novel robotic grasp detection method based on region proposal networks. Robotics and Computer-Integrated Manufacturing 65:101963
Article Google Scholar
Ainetter S, Fraundorfer F (2021) End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb. 2021 IEEE International conference on robotics and automation (ICRA), pp 13452–13458
Kumra S, Joshi S, Sahin F (2020) Antipodal robotic gras** using generative residual convolutional neural network. 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 9626–9633
Wang S, Zhou Z, Kan Z (2022) When transformer meets robotic gras**: Exploits context for efficient grasp detection. IEEE Robotics and Automation Letters 7(3):8170–8177
Article Google Scholar
Yu S, Zhai D-H, **a Y, Wu H, Liao J (2022) Se-resunet: A novel robotic grasp detection method. IEEE Robotics and Automation Letters 7(2):5238–5245
Article Google Scholar
Tian H, Song K, Li S, Ma S, Yan Y (2022) Lightweight pixel-wise generative robot gras** detection based on rgb-d dense fusion. IEEE Trans Instrum Meas 71:1–12
Google Scholar
Wu Y, Fu Y, Wang S (2024) Information-theoretic exploration for adaptive robotic gras** in clutter based on real-time pixel-level grasp detection. IEEE Trans Industr Electron 71(3):2683–2693
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the Jiangsu Agricultural Science and Technology Independent Innovation Fund (CN) under Grant CX(22)3045,and in part by the Teaching Reform Research Project of Yangzhou University under Grant xkjs2022018,and in part by the Medical Innovation Transformation Special Fund of Yangzhou University-New Medical Interdisciplinary Innovation Team under Grant AHYZUCXTD202106.

Author information

Zhibo Guo, Yu Chen, Chaiqi Guo, Meizhen **a and Tingyue Qi contributed equally to this work.

Authors and Affiliations

College of Information Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China
Yuhan Wang, Zhibo Guo, Yu Chen, Chaiqi Guo & Meizhen **a
Department of Ultrasound, Medical Imaging Center, Affiliated Hospital of Yangzhou University, Yangzhou University, Yangzhou, 225012, Jiangsu, China
Tingyue Qi

Authors

Yuhan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhibo Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chaiqi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Meizhen **a
View author publications
You can also search for this author in PubMed Google Scholar
Tingyue Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhibo Guo.

Ethics declarations

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Guo, Z., Chen, Y. et al. A robot gras** detection network based on flexible selection of multi-modal feature fusion structure. Appl Intell 54, 5044–5061 (2024). https://doi.org/10.1007/s10489-024-05427-9

Download citation

Accepted: 27 March 2024
Published: 13 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05427-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A robot gras** detection network based on flexible selection of multi-modal feature fusion structure

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A systematic review on cooperative dual-arm manipulators: modeling, planning, control, and vision strategies

Vision-based robotic gras** from object localization, object pose estimation to grasp estimation for parallel grippers: a review

A Survey on Learning-Based Robotic Gras**

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A robot gras** detection network based on flexible selection of multi-modal feature fusion structure

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A systematic review on cooperative dual-arm manipulators: modeling, planning, control, and vision strategies

Vision-based robotic gras** from object localization, object pose estimation to grasp estimation for parallel grippers: a review

A Survey on Learning-Based Robotic Gras**

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation