Semantic RGB-D Perception for Cognitive Service Robots

  • Chapter
  • First Online:
RGB-D Image Analysis and Processing

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

Cognitive robots need to understand their surroundings not only in terms of geometry, but they also need to categorize surfaces, detect objects, estimate their pose, etc. Due to their nature, RGB-D sensors are ideally suited to many of these problems, which is why we developed efficient RGB-D methods to address these tasks. In this chapter, we outline the continuous development and usage of RGB-D methods, spanning three applications: Our cognitive service robot Cosero, which participated with great success in the international RoboCup@Home competitions, an industrial kitting application, and cluttered bin picking for warehouse automation. We learn semantic segmentation using convolutional neural networks and random forests and aggregate the surface category in 3D by RGB-D SLAM. We use deep learning methods to categorize surfaces, to recognize objects and to estimate their pose. Efficient RGB-D registration methods are the basis for the manipulation of known objects. They have been extended to non-rigid registration, which allows for transferring manipulation skills to novel objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 181.89
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 235.39
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 235.39
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.centauro-project.eu.

References

  1. Asfour T, Regenstein K, Azad P, Schroder J, Bierbaum A, Vahrenkamp N, Dillmann R (2006) Armar-III: an integrated humanoid platform for sensory-motor control. In: IEEE-RAS international conference on humanoid robots (humanoids)

    Google Scholar 

  2. Badami I, Stückler J, Behnke S (2013) Depth-enhanced Hough forests for object-class detection and continuous pose estimation. In: ICRA workshop on semantic perception, map** and exploration (SPME)

    Google Scholar 

  3. Badrinarayanan V, Kendall A, Cipolla R (2015) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. ar**v:1511.00561

  4. Bansal A, Russell B, Gupta A (2016) Marr revisited: 2D-3D alignment via surface normal prediction. ar**v:1604.01347

  5. Bäuml B, Schmidt F, Wimböck T, Birbach O, Dietrich A, Fuchs M, Friedl W, Frese U, Borst C, Grebenstein M, Eiberger O, Hirzinger G (2011) Catching flying balls and preparing coffee: Humanoid Rollin’Justin performs dynamic and sensitive tasks. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  6. Beetz M, Klank U, Kresse I, Maldonado A, Mösenlechner L, Pangercic D, Rühr T, Tenorth M (2011) Robotic roommates making pancakes. In: IEEE-RAS international conference on humanoid robots (Humanoids), pp 529–536

    Google Scholar 

  7. Behnke S (2003) Hierarchical neural networks for image interpretation. Lecture notes in computer science. Springer

    Google Scholar 

  8. Berner A, Li J, Holz D, Stückler J, Behnke S, Klein R (2013) Combining contour and shape primitives for object detection and pose estimation of prefabricated parts. In: IEEE international conference on image processing (ICIP)

    Google Scholar 

  9. Bohren J, Rusu R, Jones E, Marder-Eppstein E, Pantofaru C, Wise M, Mösenlechner L, Meeussen W, Holzer S (2011) Towards autonomous robotic butlers: lessons learned with the PR2. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  10. Borst C, Wimböck T, Schmidt F, Fuchs M, Brunner B, Zacharias F, Giordano PR, Konietschke R, Sepp W, Fuchs S, et al (2009) Rollin’Justin–mobile platform with variable base. In: IEEE international conference robotics and automation (ICRA)

    Google Scholar 

  11. Choi C, Christensen HI (2016) RGB-D object pose estimation in unstructured environments. Robot Auton Syst 75:595–613

    Article  Google Scholar 

  12. Drost B, Ulrich M, Navab N, Ilic S (2010) Model globally, match locally: efficient and robust 3D object recognition. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  13. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: ICCV

    Google Scholar 

  14. Endres F, Hess J, Sturm J, Cremers D, Burgard W (2014) 3-D map** with an RGB-D camera. IEEE Trans Robot 30(1):177–187

    Article  Google Scholar 

  15. Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  16. Fox D (2016) The 100-100 tracking challenge. In: Keynote at ICRA conference

    Google Scholar 

  17. Gall J, Lempitsky VS (2009) Class-specific Hough forests for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  18. Garcia GM, Husain F, Schulz H, Frintrop S, Torras C, Behnke S (2016) Semantic segmentation priors for object discovery. In: International conference on pattern recognition (ICPR)

    Google Scholar 

  19. Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV)

    Google Scholar 

  20. Girshick RB, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158

    Article  Google Scholar 

  21. Gupta S, Hoffman J, Malik J (2016) Cross modal distillation for supervision transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2827–2836

    Google Scholar 

  22. Hermann A, Sun J, Xue Z, Rühl SW, Oberländer J, Roennau A, Zöllner JM, Dillmann R (2013) Hardware and software architecture of the bimanual mobile manipulation robot HoLLiE and its actuated upper body. In: IEEE/ASME international conference on advanced intelligent mechatronics (AIM)

    Google Scholar 

  23. Hermans A, Floros G, Leibe B (2014) Dense 3D semantic map** of indoor scenes from RGB-D images. In: ICRA

    Google Scholar 

  24. Höft N, Schulz H, Behnke S (2014) Fast semantic segmentation of RGB-D scenes with GPU-accelerated deep neural networks. In: German conference on AI

    Google Scholar 

  25. Holz D, Topalidou-Kyniazopoulou A, Stückler J, Behnke S (2015) Real-time object detection, localization and verification for fast robotic depalletizing. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 1459–1466

    Google Scholar 

  26. Husain F, Schulz H, Dellen B, Torras C, Behnke S (2016) Combining semantic and geometric features for object class segmentation of indoor scenes. IEEE Robot Autom Lett 2(1):49–55

    Article  Google Scholar 

  27. Iocchi L, Holz D, Ruiz-del Solar J, Sugiura K, van der Zant T (2015) RoboCup@Home: analysis and results of evolving competitions for domestic and service robots. Artif Intell 229:258–281

    Article  MathSciNet  Google Scholar 

  28. Kaess M, Johannsson H, Roberts R, Ila V, Leonard JJ, Dellaert F (2012) iSAM2: incremental smoothing and map** using the Bayes tree. Int J Robot Res 31(2):216–235

    Article  Google Scholar 

  29. Kerl C, Sturm J, Cremers D (2013) Robust odometry estimation for RGB-D cameras. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  30. Kittmann R, Fröhlich T, Schäfer J, Reiser U, Weißhardt F, Haug A (2015) Let me introduce myself: I am Care-O-bot 4. In: Mensch und computer

    Google Scholar 

  31. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105

    Google Scholar 

  32. Krueger V, Rovida F, Grossmann B, Petrick R, Crosby M, Charzoule A, Garcia GM, Behnke S, Toscano C, Veiga G (2018) Testing the vertical and cyber-physical integration of cognitive robots in manufacturing. Robot Comput Integr Manufact 57:213–229

    Article  Google Scholar 

  33. Kümmerle R, Grisetti G, Strasdat H, Konolige K, Burgard W (2011) G\({}^{\text{2}}\)o: a general framework for graph optimization. In: IEEE international conference on robotics and automation (ICRA), pp 3607–3613

    Google Scholar 

  34. Leibe B, Leonardis A, Schiele B (2008) Robust object detection with interleaved categorization and segmentation. Int J Comput Vis 77(1–3):259–289

    Article  Google Scholar 

  35. Leidner D, Dietrich A, Schmidt F, Borst C, Albu-Schäffer A (2014) Object-centered hybrid reasoning for whole-body mobile manipulation. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  36. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR

    Google Scholar 

  37. Mazuran M, Burgard W, Tipaldi GD (2016) Nonlinear factor recovery for long-term SLAM. Int J Robot Res 35(1–3):50–72

    Article  Google Scholar 

  38. McElhone M, Stückler J, Behnke S (2013) Joint detection and pose tracking of multi-resolution surfel models in RGB-D. In: European conference on mobile robots

    Google Scholar 

  39. Meeussen W, Wise M, Glaser S, Chitta S, McGann, C, Mihelich P, Marder-Eppstein E, Muja M, Eruhimov V, Foote T, Hsu J, Rusu RB, Marthi B, Bradski G, Konolige K, Gerkey BP, Berger E (2010) Autonomous door opening and plugging in with a personal robot. In: IEEE International conference on robotics and automation (ICRA), pp 729–736

    Google Scholar 

  40. Memmesheimer R, Seib V, Paulus D (2017) homer@UniKoblenz: winning team of the RoboCup@Home open platform league 2017. In: Robot world cup. Springer, pp 509–520

    Google Scholar 

  41. Müller AC, Behnke S (2014) Learning depth-sensitive conditional random fields for semantic segmentation of RGB-D images. In: ICRA, pp 6232–6237

    Google Scholar 

  42. Myronenko A, Song X (2010) Point set registration: coherent point drift. IEEE Trans Pattern Anal Mach Intell (PAMI) 32(12):2262–2275

    Article  Google Scholar 

  43. Nieuwenhuisen M, Droeschel D, Holz D, Stückler J, Berner A, Li J, Klein R, Behnke S (2013) Mobile bin picking with an anthropomorphic service robot. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  44. Papazov C, Haddadin S, Parusel S, Krieger K, Burschka D (2012) Rigid 3D geometry matching for gras** of known objects in cluttered scenes. Int J Robot Res 31(4):538–553

    Article  Google Scholar 

  45. Pavel MS, Schulz H, Behnke S (2015) Recurrent convolutional neural networks for object-class segmentation of RGB-D video. In: International joint conference on neural networks (IJCNN)

    Google Scholar 

  46. Quigley M, Gerkey B, Conley K, Faust J, Foote T, Leibs J, Berger E, Wheeler R, Ng A (2009) ROS: an open-source robot operating system. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  47. Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS), pp 91–99

    Google Scholar 

  48. Schulz H, Behnke S (2012) Learning object-class segmentation with convolutional neural networks. In: European symposium on artificial neural networks

    Google Scholar 

  49. Schulz H, Höft N, Behnke S (2015) Depth and height aware semantic RGB-D perception with convolutional neural networks. In: ESANN

    Google Scholar 

  50. Schulz H, Waldvogel B, Sheikh R, Behnke S (2015) CURFIL: random forests for image labeling on GPU. In: International conference on computer vision theory and applications (VISAPP), pp 156–164

    Google Scholar 

  51. Schwarz M, Milan A, Periyasamy AS, Behnke S (2018) RGB-D object detection and semantic segmentation for autonomous manipulation in clutter. Int J Robot Res 37(4–5):437–451

    Article  Google Scholar 

  52. Schwarz M, Schulz H, Behnke S (2015) RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In: IEEE international conference on robotics and automation (ICRA), pp 1329–1335

    Google Scholar 

  53. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. ar**v:1312.6229

  54. Stoyanov T, Magnusson M, Andreasson H, Lilienthal AJ (2012) Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. Int J Robot Res 31(12):1377–1393

    Article  Google Scholar 

  55. Stroucken S (2013) Graph-basierte 3D-kartierung von innenräumen mit einem RGBD-multikamera-system. Diplomarbeit, Universität Bonn, Computer Science VI

    Google Scholar 

  56. Stückler J, Behnke S (2013) Hierarchical object discovery and dense modelling from motion cues in RGB-D video. In: International conference artificial intelligence (IJCAI)

    Google Scholar 

  57. Stückler J, Behnke S (2014) Adaptive tool-use strategies for anthropomorphic service robots. In: IEEE-RAS International conference on humanoid robots (Humanoids)

    Google Scholar 

  58. Stückler J, Behnke S (2014) Efficient deformable registration of multi-resolution surfel maps for object manipulation skill transfer. In: IEEE international conference on robotics and automation (ICRA)

    Google Scholar 

  59. Stückler J, Behnke S (2014) Multi-resolution surfel maps for efficient dense 3D modeling and tracking. J Vis Commun Image Represent 25(1):137–147

    Article  Google Scholar 

  60. Stückler J, Behnke S (2015) Efficient dense rigid-body motion segmentation and estimation in RGB-D video. Int J Comput Vis 113(3):233–245

    Article  MathSciNet  Google Scholar 

  61. Stückler J, Droeschel D, Gräve K, Holz D, Schreiber M, Topalidou-Kyniazopoulou A, Schwarz M, Behnke S (2014) Increasing flexibility of mobile manipulation and intuitive human-robot interaction in RoboCup@Home. In: RoboCup 2013: robot world cup XVII. Springer, pp 135–146

    Google Scholar 

  62. Stückler J, Schwarz M, Behnke S (2016) Mobile manipulation, tool use, and intuitive interaction for cognitive service robot cosero. Front Robot AI 3:58

    Article  Google Scholar 

  63. Stückler J, Steffens R, Holz D, Behnke S (2013) Efficient 3D object perception and grasp planning for mobile manipulation in domestic environments. Robot Auton Syst 61(10):1106–1115

    Article  Google Scholar 

  64. Stückler J, Waldvogel B, Schulz H, Behnke S (2015) Dense real-time map** of object-class semantics from RGB-D video. J R Time Image Proc 10(4):599–609

    Article  Google Scholar 

  65. Su H, Qi CR, Li Y, Guibas LJ (2015) Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: IEEE international conference on computer vision (ICCV)

    Google Scholar 

  66. Thrun S, Montemerlo M (2006) The graph SLAM algorithm with applications to large-scale map** of urban structures. Int J Robot Res 25(5–6):403–429

    Article  Google Scholar 

  67. van der Burgh M, Lunenburg J, Appeldoorn R, Wijnands R, Clephas T, Baeten M, van Beek L, Ottervanger R, van Rooy H, van de Molengraft M (2017) Tech United Eindhoven @Home 2017 team description paper. University of Technology Eindhoven

    Google Scholar 

  68. Vahrenkamp N, Asfour T, Dillmann R (2012) Simultaneous grasp and motion planning: humanoid robot ARMAR-III. Robot Autom Mag

    Google Scholar 

  69. Wachsmuth S, Lier F, Meyer zu Borgsen S, Kummert J, Lach L, Sixt D (2017) ToBI-team of bielefeld a human-robot interaction system for RoboCup@ home 2017

    Google Scholar 

  70. Whelan T, Kaess M, Johannsson H, Fallon MF, Leonard JJ, McDonald J (2015) Real-time large-scale dense RGB-D SLAM with volumetric fusion. Int J Robot Res 34(4–5):598–626

    Article  Google Scholar 

  71. Whelan T, Leutenegger S, Salas-Moreno R, Glocker B, Davison AJ (2015) ElasticFusion: dense SLAM without a pose graph. In: Robotics: science and systems

    Google Scholar 

  72. Wisspeintner T, van der Zant T, Iocchi L, Schiffer S (2009) RoboCup@Home: scientific competition and benchmarking for domestic service robots. Interact Stud 10(3):392–426

    Article  Google Scholar 

  73. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, **ao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  74. Zhang J, Singh S (2014) Loam: lidar odometry and map** in real-time. In: Robotics: science and systems conference (RSS), pp 109–111

    Google Scholar 

Download references

Acknowledgements

The authors thank the numerous people involved in development and operation of the mentioned robotic systems: Nikita Araslanov, Ishrat Badami, David Droeschel, Germán Martín García, Kathrin Gräve, Dirk Holz, Jochen Kläß, Christian Lenz, Manus McElhone, Anton Milan, Aura Munoz, Matthias Nieuwenhuisen, Arul Selvam Periyasamy, Michael Schreiber, Sebastian Schüller, David Schwarz, Ricarda Steffens, Jörg Stückler, and Angeliki Topalidou-Kyniazopoulou.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Schwarz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Schwarz, M., Behnke, S. (2019). Semantic RGB-D Perception for Cognitive Service Robots. In: Rosin, P., Lai, YK., Shao, L., Liu, Y. (eds) RGB-D Image Analysis and Processing. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-28603-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28603-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28602-6

  • Online ISBN: 978-3-030-28603-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation