Abstract
In unstructured scenarios, objects usually have unique shapes, poses, and other uncertainties, which put forward higher requirements for the robot’s planar gras** detection ability. Most previous methods use single-modal data or simply fused multi-modal data to predict gripper configurations. Single-modal data is not conducive to comprehensively describe the diversity of objects, and the simple fusion method may also ignore the dependencies between multi-modal data. Based on the above considerations, we propose a Multi-modal Dynamic Cooperative Fusion Network (MDCNet), in which a Multilevel Semantic Guided Fusion Module (MSG) is designed, through which enhanced semantic guidance vectors are used to suppress the undesired influence factors produced by different fusion structures. In addition, we also design a general Enhanced Feature Pyramid Nets Structure (EFPN) to learn the dependencies between fine-grained features and coarse-grained features and improve the robustness of the encoder in unstructured scenarios. The results show that the proposed method has an accuracy rate of 98.9% on the Jacquard dataset and 99.6% on the Cornell dataset. In over 2000 robotic grasp trials, our structure achieves a grasp success rate of 98.8% in single-object scenarios and 93.5% in cluttered scenarios. The proposed method in this paper is superior to previous grasp detection methods in both speed and accuracy, and has strong real-time performance.
Similar content being viewed by others
Data Availability
The data used in this paper are all from public datasets.
References
Hu Y, Wu X, Geng P, Li Z (2019) Evolution strategies learning with variable impedance control for gras** under uncertainty. IEEE Trans Industr Electron 66(10):7788–7799
Li G, Li N, Chang F, Liu C (2022) Adaptive graph convolutional network with adversarial learning for skeleton-based action prediction. IEEE Transactions on Cognitive and Developmental Systems 14(3):1258–1269
Buzzatto J, Chapman J, Shahmohammadi M, Sanches F, Nejati M, Matsunaga S, Haraguchi R, Mariyama T, MacDonald B, Liarokapis M (2022) On robotic manipulation of flexible flat cables: Employing a multi-modal gripper with dexterous tips, active nails, and a reconfigurable suction cup module. 2022 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 1602–1608
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inform Process Syst 27
Socher R, Huang E, Pennin J, Manning CD, Ng A (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv Neural Inform Process Syst 24
Johns E, Leutenegger S, Davison AJ (2016) Pairwise decomposition of image sequences for active multi-view recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3813–3822
Corsaro M, Tellex S, Konidaris G (2021) Learning to detect multi-modal grasps for dexterous gras** in dense clutter. 2021 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4647–4653
Prabhu T, Manivannan P, Roy D et al (2021) A robust tactile sensor matrix for intelligent gras** of objects using robotic grippers. 2021 International symposium of asian control association on intelligent robotics and industrial automation (IRIA), pp 400–405
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, pp 21–37
Wang C, Xu D, Zhu Y, Martín-Martín R, Lu C, Fei-Fei L, Savarese S (2019) Densefusion: 6d object pose estimation by iterative dense fusion. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3343–3352
Zhang J, Liu H, Yang K, Hu X, Liu R, Stiefelhagen R (2023) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. IEEE Transactions on intelligent transportation systems, pp 1–16
Yan R, Yang K, Wang K (2021) Nlfnet: non-local fusion towards generalized multimodal semantic segmentation across rgb-depth, polarization, and thermal images. 2021 IEEE International conference on robotics and biomimetics (ROBIO), pp 1129–1135
Su Y, Yuan Y, Jiang Z (2021) Deep feature selection-and-fusion for rgb-d semantic segmentation. 2021 IEEE International conference on multimedia and expo (ICME), pp 1–6
Redmon J, Angelova A (2015) Real-time grasp detection using convolutional neural networks. 2015 IEEE International conference on robotics and automation (ICRA), pp 1316–1322
Jiang Y, Moseson S, Saxena A (2011) Efficient gras** from rgbd images: Learning using a new rectangle representation. 2011 IEEE International conference on robotics and automation, pp 3304–3311
Nishi T, Shinya Kawasaki KI (2023) M3r-cnn: on effective multi-modal fusion of rgb and depth cues for instance segmentation in bin-picking. Adv Robot 37(18):1143–1157
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X (2017) Residual attention network for image classification. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), pp 3–19
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Luo J, Lu J, Yue G (2021) Seatbelt detection in road surveillance images based on improved dense residual network with two-level attention mechanism. J Electron Imaging 30(3):033–036
Chu F-J, Xu R, Vela PA (2018) Real-world multiobject, multigrasp detection. IEEE Robotics and Automation Letters 3(4):3355–3362
Karaoguz H, Jensfelt P (2019) Object detection approach for robot grasp detection. 2019 International conference on robotics and automation (ICRA), pp 4953–4959
Xu Y, Wang L, Yang A, Chen L (2019) Graspcnn: Real-time grasp detection using a new oriented diameter circle representation. IEEE Access 7:159322–159331
Dong Z, Tian H, Bao X, Yan Y, Chen F (2022) Graspvdn: scene-oriented grasp estimation by learning vector representations of grasps. Complex & Intelligent Systems 8(4):2911–2922
Chen L, Huang P, Li Y, Meng Z (2020) Detecting graspable rectangles of objects in robotic gras**. Int J Control Autom Syst 18(5):1343–1352
Yang L, Cui G, Chen S, Zhu X (2021) Research on robot classifiable grasp detection method based on convolutional neural network. International conference on intelligent robotics and applications, pp 705–715
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
Girshick R (2015) Fast r-cnn. Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Lenz I, Lee H, Saxena A (2015) Deep learning for detecting robotic grasps. Int J Robot Res 34(4–5):705–724
Zhang H, Zhou X, Lan X, Li J, Tian Z, Zheng N (2019) A real-time robotic gras** approach with oriented anchor box. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51(5):3014–3025
Yu S, Zhai D-H, **a Y (2023) Skgnet: Robotic grasp detection with selective kernel convolution. IEEE Trans Autom Sci Eng 20(4):2241–2252
Asif U, Bennamoun M, Sohel FA (2017) Rgb-d object recognition and grasp detection using hierarchical cascaded forests. IEEE Trans Rob 33(3):547–564
Morrison D, Corke P, Leitner J (2020) Learning robust, real-time, reactive robotic gras**. The International journal of robotics research 39(2–3):183–201
Zhihong C, Hebin Z, Yanbo W, Binyan L, Yu L (2017) A vision-based robotic gras** system using deep learning for garbage sorting. 2017 36th Chinese control conference (CCC), pp 11223–11226
Wang J, Yin H, Zhang S, Gui P, Xu K (2019) Accurate rapid gras** of small industrial parts from charging tray in clutter scenes. Sensors and Materials 31(6):2089–2101
Kwiatkowski J, Cockburn D, Duchaine V (2017) Grasp stability assessment through the fusion of proprioception and tactile signals using convolutional neural networks. 2017 IEEE/RSJ Int Conf Int Robot Syst (IROS), pp 286–292
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13713–13722
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the AAAI conference on artificial intelligence 34(07):12993–13000
Depierre A, Dellandréa E, Chen L (2018) Jacquard: A large scale dataset for robotic grasp detection. 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 3511–3516
Kumra S, Kanan C (2017) Robotic grasp detection using deep convolutional neural networks. 2017 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 769–776
Song Y, Gao L, Li X, Shen W (2020) A novel robotic grasp detection method based on region proposal networks. Robotics and Computer-Integrated Manufacturing 65:101963
Ainetter S, Fraundorfer F (2021) End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb. 2021 IEEE International conference on robotics and automation (ICRA), pp 13452–13458
Kumra S, Joshi S, Sahin F (2020) Antipodal robotic gras** using generative residual convolutional neural network. 2020 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 9626–9633
Wang S, Zhou Z, Kan Z (2022) When transformer meets robotic gras**: Exploits context for efficient grasp detection. IEEE Robotics and Automation Letters 7(3):8170–8177
Yu S, Zhai D-H, **a Y, Wu H, Liao J (2022) Se-resunet: A novel robotic grasp detection method. IEEE Robotics and Automation Letters 7(2):5238–5245
Tian H, Song K, Li S, Ma S, Yan Y (2022) Lightweight pixel-wise generative robot gras** detection based on rgb-d dense fusion. IEEE Trans Instrum Meas 71:1–12
Wu Y, Fu Y, Wang S (2024) Information-theoretic exploration for adaptive robotic gras** in clutter based on real-time pixel-level grasp detection. IEEE Trans Industr Electron 71(3):2683–2693
Acknowledgements
This work was supported in part by the Jiangsu Agricultural Science and Technology Independent Innovation Fund (CN) under Grant CX(22)3045,and in part by the Teaching Reform Research Project of Yangzhou University under Grant xkjs2022018,and in part by the Medical Innovation Transformation Special Fund of Yangzhou University-New Medical Interdisciplinary Innovation Team under Grant AHYZUCXTD202106.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Guo, Z., Chen, Y. et al. A robot gras** detection network based on flexible selection of multi-modal feature fusion structure. Appl Intell 54, 5044–5061 (2024). https://doi.org/10.1007/s10489-024-05427-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05427-9