Semantic RGB-D Perception for Cognitive Service Robots

Schwarz, Max; Behnke, Sven

doi:10.1007/978-3-030-28603-3_13

Max Schwarz¹⁵ &
Sven Behnke¹⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1657 Accesses
2 Citations

Abstract

Cognitive robots need to understand their surroundings not only in terms of geometry, but they also need to categorize surfaces, detect objects, estimate their pose, etc. Due to their nature, RGB-D sensors are ideally suited to many of these problems, which is why we developed efficient RGB-D methods to address these tasks. In this chapter, we outline the continuous development and usage of RGB-D methods, spanning three applications: Our cognitive service robot Cosero, which participated with great success in the international RoboCup@Home competitions, an industrial kitting application, and cluttered bin picking for warehouse automation. We learn semantic segmentation using convolutional neural networks and random forests and aggregate the surface category in 3D by RGB-D SLAM. We use deep learning methods to categorize surfaces, to recognize objects and to estimate their pose. Efficient RGB-D registration methods are the basis for the manipulation of known objects. They have been extended to non-rigid registration, which allows for transferring manipulation skills to novel objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 181.89; Price includes VAT (Germany)

Softcover Book: EUR 235.39; Price includes VAT (Germany)

Hardcover Book: EUR 235.39; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Bridging the Robot Perception Gap with Mid-Level Vision

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Notes

1.
https://www.centauro-project.eu.

References

Asfour T, Regenstein K, Azad P, Schroder J, Bierbaum A, Vahrenkamp N, Dillmann R (2006) Armar-III: an integrated humanoid platform for sensory-motor control. In: IEEE-RAS international conference on humanoid robots (humanoids)
Google Scholar
Badami I, Stückler J, Behnke S (2013) Depth-enhanced Hough forests for object-class detection and continuous pose estimation. In: ICRA workshop on semantic perception, map** and exploration (SPME)
Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. ar**v:1511.00561
Bansal A, Russell B, Gupta A (2016) Marr revisited: 2D-3D alignment via surface normal prediction. ar**v:1604.01347
Bäuml B, Schmidt F, Wimböck T, Birbach O, Dietrich A, Fuchs M, Friedl W, Frese U, Borst C, Grebenstein M, Eiberger O, Hirzinger G (2011) Catching flying balls and preparing coffee: Humanoid Rollin’Justin performs dynamic and sensitive tasks. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Beetz M, Klank U, Kresse I, Maldonado A, Mösenlechner L, Pangercic D, Rühr T, Tenorth M (2011) Robotic roommates making pancakes. In: IEEE-RAS international conference on humanoid robots (Humanoids), pp 529–536
Google Scholar
Behnke S (2003) Hierarchical neural networks for image interpretation. Lecture notes in computer science. Springer
Google Scholar
Berner A, Li J, Holz D, Stückler J, Behnke S, Klein R (2013) Combining contour and shape primitives for object detection and pose estimation of prefabricated parts. In: IEEE international conference on image processing (ICIP)
Google Scholar
Bohren J, Rusu R, Jones E, Marder-Eppstein E, Pantofaru C, Wise M, Mösenlechner L, Meeussen W, Holzer S (2011) Towards autonomous robotic butlers: lessons learned with the PR2. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Borst C, Wimböck T, Schmidt F, Fuchs M, Brunner B, Zacharias F, Giordano PR, Konietschke R, Sepp W, Fuchs S, et al (2009) Rollin’Justin–mobile platform with variable base. In: IEEE international conference robotics and automation (ICRA)
Google Scholar
Choi C, Christensen HI (2016) RGB-D object pose estimation in unstructured environments. Robot Auton Syst 75:595–613
Article Google Scholar
Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV
Google Scholar
Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-D map** with an RGB-D camera. IEEE Trans Robot 30(1):177–187
Article Google Scholar
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Fox D (2016) The 100-100 tracking challenge. In: Keynote at ICRA conference
Google Scholar
Gall J, Lempitsky VS (2009) Class-specific Hough forests for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Garcia GM, Husain F, Schulz H, Frintrop S, Torras C, Behnke S (2016) Semantic segmentation priors for object discovery. In: International conference on pattern recognition (ICPR)
Google Scholar
Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV)
Google Scholar
Girshick RB, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Article Google Scholar
Gupta S, Hoffman J, Malik J (2016) Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2827–2836
Google Scholar
Hermann A, Sun J, Xue Z, Rühl SW, Oberländer J, Roennau A, Zöllner JM, Dillmann R (2013) Hardware and software architecture of the bimanual mobile manipulation robot HoLLiE and its actuated upper body. In: IEEE/ASME international conference on advanced intelligent mechatronics (AIM)
Google Scholar
Hermans A, Floros G, Leibe B (2014) Dense 3D semantic map** of indoor scenes from RGB-D images. In: ICRA
Google Scholar
Höft N, Schulz H, Behnke S (2014) Fast semantic segmentation of RGB-D scenes with GPU-accelerated deep neural networks. In: German conference on AI
Google Scholar
Holz D, Topalidou-Kyniazopoulou A, Stückler J, Behnke S (2015) Real-time object detection, localization and verification for fast robotic depalletizing. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1459–1466
Google Scholar
Husain F, Schulz H, Dellen B, Torras C, Behnke S (2016) Combining semantic and geometric features for object class segmentation of indoor scenes. IEEE Robot Autom Lett 2(1):49–55
Article Google Scholar
Iocchi L, Holz D, Ruiz-del Solar J, Sugiura K, van der Zant T (2015) RoboCup@Home: analysis and results of evolving competitions for domestic and service robots. Artif Intell 229:258–281
Article MathSciNet Google Scholar
Kaess M, Johannsson H, Roberts R, Ila V, Leonard JJ, Dellaert F (2012) iSAM2: incremental smoothing and map** using the Bayes tree. Int J Robot Res 31(2):216–235
Article Google Scholar
Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for RGB-D cameras. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Kittmann R, Fröhlich T, Schäfer J, Reiser U, Weißhardt F, Haug A (2015) Let me introduce myself: I am Care-O-bot 4. In: Mensch und computer
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Google Scholar
Krueger V, Rovida F, Grossmann B, Petrick R, Crosby M, Charzoule A, Garcia GM, Behnke S, Toscano C, Veiga G (2018) Testing the vertical and cyber-physical integration of cognitive robots in manufacturing. Robot Comput Integr Manufact 57:213–229
Article Google Scholar
Kümmerle R, Grisetti G, Strasdat H, Konolige K, Burgard W (2011) G\({}^{\text{2}}\)o: a general framework for graph optimization. In: IEEE international conference on robotics and automation (ICRA), pp 3607–3613
Google Scholar
Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1–3):259–289
Article Google Scholar
Leidner D, Dietrich A, Schmidt F, Borst C, Albu-Schäffer A (2014) Object-centered hybrid reasoning for whole-body mobile manipulation. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR
Google Scholar
Mazuran M, Burgard W, Tipaldi GD (2016) Nonlinear factor recovery for long-term SLAM. Int J Robot Res 35(1–3):50–72
Article Google Scholar
McElhone M, Stückler J, Behnke S (2013) Joint detection and pose tracking of multi-resolution surfel models in RGB-D. In: European conference on mobile robots
Google Scholar
Meeussen W, Wise M, Glaser S, Chitta S, McGann, C, Mihelich P, Marder-Eppstein E, Muja M, Eruhimov V, Foote T, Hsu J, Rusu RB, Marthi B, Bradski G, Konolige K, Gerkey BP, Berger E (2010) Autonomous door opening and plugging in with a personal robot. In: IEEE International conference on robotics and automation (ICRA), pp 729–736
Google Scholar
Memmesheimer R, Seib V, Paulus D (2017) homer@UniKoblenz: winning team of the RoboCup@Home open platform league 2017. In: Robot world cup. Springer, pp 509–520
Google Scholar
Müller AC, Behnke S (2014) Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images. In: ICRA, pp 6232–6237
Google Scholar
Myronenko A, Song X (2010) Point set registration: coherent point drift. IEEE Trans Pattern Anal Mach Intell (PAMI) 32(12):2262–2275
Article Google Scholar
Nieuwenhuisen M, Droeschel D, Holz D, Stückler J, Berner A, Li J, Klein R, Behnke S (2013) Mobile bin picking with an anthropomorphic service robot. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Papazov C, Haddadin S, Parusel S, Krieger K, Burschka D (2012) Rigid 3D geometry matching for gras** of known objects in cluttered scenes. Int J Robot Res 31(4):538–553
Article Google Scholar
Pavel MS, Schulz H, Behnke S (2015) Recurrent convolutional neural networks for object-class segmentation of RGB-D video. In: International joint conference on neural networks (IJCNN)
Google Scholar
Quigley M, Gerkey B, Conley K, Faust J, Foote T, Leibs J, Berger E, Wheeler R, Ng A (2009) ROS: an open-source robot operating system. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99
Google Scholar
Schulz H, Behnke S (2012) Learning object-class segmentation with convolutional neural networks. In: European symposium on artificial neural networks
Google Scholar
Schulz H, Höft N, Behnke S (2015) Depth and height aware semantic RGB-D perception with convolutional neural networks. In: ESANN
Google Scholar
Schulz H, Waldvogel B, Sheikh R, Behnke S (2015) CURFIL: random forests for image labeling on GPU. In: International conference on computer vision theory and applications (VISAPP), pp 156–164
Google Scholar
Schwarz M, Milan A, Periyasamy AS, Behnke S (2018) RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. Int J Robot Res 37(4–5):437–451
Article Google Scholar
Schwarz M, Schulz H, Behnke S (2015) RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: IEEE international conference on robotics and automation (ICRA), pp 1329–1335
Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. ar**v:1312.6229
Stoyanov T, Magnusson M, Andreasson H, Lilienthal AJ (2012) Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. Int J Robot Res 31(12):1377–1393
Article Google Scholar
Stroucken S (2013) Graph-basierte 3D-kartierung von innenräumen mit einem RGBD-multikamera-system. Diplomarbeit, Universität Bonn, Computer Science VI
Google Scholar
Stückler J, Behnke S (2013) Hierarchical object discovery and dense modelling from motion cues in RGB-D video. In: International conference artificial intelligence (IJCAI)
Google Scholar
Stückler J, Behnke S (2014) Adaptive tool-use strategies for anthropomorphic service robots. In: IEEE-RAS International conference on humanoid robots (Humanoids)
Google Scholar
Stückler J, Behnke S (2014) Efficient deformable registration of multi-resolution surfel maps for object manipulation skill transfer. In: IEEE international conference on robotics and automation (ICRA)
Google Scholar
Stückler J, Behnke S (2014) Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J Vis Commun Image Represent 25(1):137–147
Article Google Scholar
Stückler J, Behnke S (2015) Efficient dense rigid-body motion segmentation and estimation in RGB-D video. Int J Comput Vis 113(3):233–245
Article MathSciNet Google Scholar
Stückler J, Droeschel D, Gräve K, Holz D, Schreiber M, Topalidou-Kyniazopoulou A, Schwarz M, Behnke S (2014) Increasing flexibility of mobile manipulation and intuitive human-robot interaction in RoboCup@Home. In: RoboCup 2013: robot world cup XVII. Springer, pp 135–146
Google Scholar
Stückler J, Schwarz M, Behnke S (2016) Mobile manipulation, tool use, and intuitive interaction for cognitive service robot cosero. Front Robot AI 3:58
Article Google Scholar
Stückler J, Steffens R, Holz D, Behnke S (2013) Efficient 3D object perception and grasp planning for mobile manipulation in domestic environments. Robot Auton Syst 61(10):1106–1115
Article Google Scholar
Stückler J, Waldvogel B, Schulz H, Behnke S (2015) Dense real-time map** of object-class semantics from RGB-D video. J R Time Image Proc 10(4):599–609
Article Google Scholar
Su H, Qi CR, Li Y, Guibas LJ (2015) Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE international conference on computer vision (ICCV)
Google Scholar
Thrun S, Montemerlo M (2006) The graph SLAM algorithm with applications to large-scale map** of urban structures. Int J Robot Res 25(5–6):403–429
Article Google Scholar
van der Burgh M, Lunenburg J, Appeldoorn R, Wijnands R, Clephas T, Baeten M, van Beek L, Ottervanger R, van Rooy H, van de Molengraft M (2017) Tech United Eindhoven @Home 2017 team description paper. University of Technology Eindhoven
Google Scholar
Vahrenkamp N, Asfour T, Dillmann R (2012) Simultaneous grasp and motion planning: humanoid robot ARMAR-III. Robot Autom Mag
Google Scholar
Wachsmuth S, Lier F, Meyer zu Borgsen S, Kummert J, Lach L, Sixt D (2017) ToBI-team of bielefeld a human-robot interaction system for RoboCup@ home 2017
Google Scholar
Whelan T, Kaess M, Johannsson H, Fallon MF, Leonard JJ, McDonald J (2015) Real-time large-scale dense RGB-D SLAM with volumetric fusion. Int J Robot Res 34(4–5):598–626
Article Google Scholar
Whelan T, Leutenegger S, Salas-Moreno R, Glocker B, Davison AJ (2015) ElasticFusion: dense SLAM without a pose graph. In: Robotics: science and systems
Google Scholar
Wisspeintner T, van der Zant T, Iocchi L, Schiffer S (2009) RoboCup@Home: scientific competition and benchmarking for domestic service robots. Interact Stud 10(3):392–426
Article Google Scholar
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, **ao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Zhang J, Singh S (2014) Loam: lidar odometry and map** in real-time. In: Robotics: science and systems conference (RSS), pp 109–111
Google Scholar

Download references

Acknowledgements

The authors thank the numerous people involved in development and operation of the mentioned robotic systems: Nikita Araslanov, Ishrat Badami, David Droeschel, Germán Martín García, Kathrin Gräve, Dirk Holz, Jochen Kläß, Christian Lenz, Manus McElhone, Anton Milan, Aura Munoz, Matthias Nieuwenhuisen, Arul Selvam Periyasamy, Michael Schreiber, Sebastian Schüller, David Schwarz, Ricarda Steffens, Jörg Stückler, and Angeliki Topalidou-Kyniazopoulou.

Author information

Authors and Affiliations

Autonomous Intelligent Systems, Computer Science Institute VI University of Bonn, Bonn, Germany
Max Schwarz & Sven Behnke

Authors

Max Schwarz
View author publications
You can also search for this author in PubMed Google Scholar
Sven Behnke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Schwarz .

Editor information

Editors and Affiliations

School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Paul L. Rosin
School of Computer Science and Informatics, Cardiff University, Cardiff, UK
Yu-Kun Lai
IEEE, University of East Anglia, Norwich, UK
Ling Shao
Department of Computer Science, Edge Hill University, Ormskirk, UK
Yonghuai Liu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schwarz, M., Behnke, S. (2019). Semantic RGB-D Perception for Cognitive Service Robots. In: Rosin, P., Lai, YK., Shao, L., Liu, Y. (eds) RGB-D Image Analysis and Processing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-28603-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-28603-3_13
Published: 27 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28602-6
Online ISBN: 978-3-030-28603-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semantic RGB-D Perception for Cognitive Service Robots

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Bridging the Robot Perception Gap with Mid-Level Vision

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semantic RGB-D Perception for Cognitive Service Robots

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

Bridging the Robot Perception Gap with Mid-Level Vision

Integration of CNN into a Robotic Architecture to Build Semantic Maps of Indoor Environments

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation