Abstract

During social interactions, humans are capable of initiating and responding to rich and complex social actions despite having incomplete world knowledge, and physical, perceptual and computational constraints. This capability relies on action perception mechanisms that exploit regularities in observed goal-oriented behaviours to generate robust predictions and reduce the workload of sensing systems. To achieve this essential capability, we argue that the following three factors are fundamental. First, human knowledge is frequently hierarchically structured, both in the perceptual and execution domains. Second, human perception is an active process driven by current task requirements and context; this is particularly important when the perceptual input is complex (e.g. human motion) and the agent has to operate under embodiment constraints. Third, learning is at the heart of action perception mechanisms, underlying the agent’s ability to add new behaviours to its repertoire. Based on these factors, we review multiple instantiations of a hierarchically-organised biologically-inspired framework for embodied action perception, demonstrating its flexibility in addressing the rich computational contexts of action perception and learning in robotic platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 106.99
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 106.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Aloimonos, J., Weiss, I., Bandyopadhyay, A. (1988). Active vision. International Journal of Computer Vision, 1(4), 333–356.

    Article  Google Scholar 

  • Bajcsy, R. (1988). Active perception. Proceedings of the IEEE, 76(8), 966–1005.

    Article  Google Scholar 

  • Ballard, D. (1991). Animate vision. Artificial Intelligence, 48, 57–86.

    Article  Google Scholar 

  • Bar, M. (2007). The proactive brain: using analogies and associations to generate predictions. Trends in Cognitive Science, 11(7), 280–289.

    Article  Google Scholar 

  • Bar, M., & Biederman, I. (1999). Localizing the cortical region mediating visual awareness of object identity. Proceedings of the National Academy of Sciences USA, 96(4), 1790–1793.

    Article  Google Scholar 

  • Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Schmidt, A. M., Dale, A. M., Hamalainen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences USA, 103(2), 449–454.

    Article  Google Scholar 

  • Bishop, C. M., & Lasserre, J. (2007). Generative or discriminative? getting the best of both worlds. Bayesian Statistics, 8, 3–24.

    MathSciNet  Google Scholar 

  • Buschman, T. J., & Miller, E. K. (2007). Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science, 315(5820), 1860–1862.

    Article  Google Scholar 

  • Calvo-Merino, B., Glaser, D., Grèzes, J., Passingham, R., Haggard, P. (2005). Action observation and acquired motor skills: an fmri study with expert dancers. Cerebral Cortex, 15(8), 1243–1249.

    Article  Google Scholar 

  • Cuijpers, R. H., van Schie, H. T., Koppen, M., Erlhagen, W., Bekkering, H. (2006). Goals and means in action observation: a computational approach. Neural Networks, 19(3), 311–322.

    Article  MATH  Google Scholar 

  • Dawkins, R., Bateson, P. P. G., & Hinde, R. A. (1976). Growing points in ethology (pp. 7–54). London: Cambridge University Press.

    Google Scholar 

  • Dearden, A. M., & Demiris, Y. (2005). Learning forward models for robots. In IJCAI-05, Proceedings of the nineteenth international joint conference on artificial intelligence, Edinburgh, Scotland, UK, July 30–August 5, 2005 (pp. 1440–1445).

    Google Scholar 

  • Demiris, Y. (2007). Prediction of intent in robotics and multi-agent systems. Cognitive Processing, 8(3), 151–158.

    Article  Google Scholar 

  • Demiris, Y., & Hayes, G. M. (2002). Imitation as a dual-route process featuring predictive and learning components: a biologically-plausible computational model. In Imitation in animals and artifacts. Cambridge: MIT.

    Google Scholar 

  • Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of actions: a biologically inspired robotics architecture for imitation and learning. Connection Science, 15(4), 231–243.

    Article  Google Scholar 

  • Demiris, Y., & Khadhouri, B. (2006). Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5), 361–369.

    Article  Google Scholar 

  • Demiris, Y., & Khadhouri, B. (2008). Content-based control of goal-directed attention during human action perception. Interaction Studies, 9(2), 353–376.

    Article  Google Scholar 

  • Epshtein, B., & Ullman, S. (2007). Semantic hierarchies for recognizing objects and parts. In IEEE conference on computer vision and pattern recognition, 2007. CVPR’07 (pp. 1–8). New York: IEEE.

    Google Scholar 

  • Fadiga, L., Fogassi, L., Pavesi, G., Rizzolatti, G. (1995). Motor facilitation during action observation: a magnetic stimulation study. Journal of Neurophysiology, 73(6), 2608–2611.

    Google Scholar 

  • Fagioli, S., Hommel, B., Schubotz, R. (2007). Intentional control of attention: action planning primes action-related stimulus dimensions. Psychological Research, 71(1), 22–29.

    Article  Google Scholar 

  • Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119(2), 593.

    Article  Google Scholar 

  • Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.

    Article  Google Scholar 

  • Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2001). Phase-specific modulation of cortical motor output during movement observation. Neuroreport, 12(7), 1489.

    Article  Google Scholar 

  • Gangitano, M., Mottaghy, F., Pascual-Leone, A. (2004). Modulation of premotor mirror neuron activity during observation of unpredictable gras** movements. European Journal of Neuroscience, 20(8), 2193–2202.

    Article  Google Scholar 

  • Gazzola, V., & Keysers, C. (2009). The observation and execution of actions share motor and somatosensory voxels in all tested subjects: single-subject analyses of unsmoothed fmri data. Cerebral Cortex, 19(6), 1239–1255.

    Article  Google Scholar 

  • Gopnik, A., & Meltzoff, A. (1997). Words, Thoughts, and Theories. Cambridge: MIT.

    Google Scholar 

  • Grafton, S., et al. (2007). Evidence for a distributed hierarchy of action representation in the brain. Human Movement Science, 26(4), 590–616.

    Article  Google Scholar 

  • Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences, 27(3), 377–96.

    Google Scholar 

  • Haruno, M., Wolpert, D., Kawato, M. (2001). Mosaic model for sensorimotor learning and control. Neural Computation, 13(10), 2201–2220.

    Article  MATH  Google Scholar 

  • Haruno, M., Wolpert, D., Kawato, M. (2003). Hierarchical mosaic for movement generation. Excepta Medica International Coungress Series, 1250, 575–590.

    Article  Google Scholar 

  • Hess, W. R. (1957). The functional organization of the diencephalon. New York: Grune & Stratton.

    Google Scholar 

  • Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception. Trends in Cognitive Sciences, 6(6), 242–247.

    Article  Google Scholar 

  • Hinton, G. (2010). Learning to represent visual input. Philosophical Transactions of the Royal Society B: Biological Sciences, 365(1537), 177.

    Article  Google Scholar 

  • Hinton, G. E., & Ghahramani, Z. (1997). Generative models for discovering sparse distributed representations. Philosophical Transactions of the Royal Society of London B, 352, 1177–1190.

    Article  Google Scholar 

  • Honeycutt, C., & Nichols, T. (2010). The decerebrate cat generates the essential features of the force constraint strategy. Journal of Neurophysiology, 103(6), 3266.

    Article  Google Scholar 

  • Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286(5449), 2526–2528.

    Article  Google Scholar 

  • Ivanov, Y., & Bobick, A. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 852–872.

    Article  Google Scholar 

  • Jeannerod, M. (1981). Intersegmental coordination during reaching at natural visual objects (vol. 9, pp. 153–168). Hillsdale: Lawrence Erlbaum Associates, Inc.

    Google Scholar 

  • Jeannerod, M. (1994). The representing brain: neural correlates of motor intention and imagery. Behavioral and Brain Sciences, 17(02), 187–202.

    Article  Google Scholar 

  • Johnson, M., & Demiris, Y. (2004). Towards Autonomous Robotic Systems: Proceedings of TAROS 2004; University of Essex, 6.-8.9.2004. Technical report series/Department of Computer Science, University of Essex. http://books.google.co.uk/books?id=XIzhjwEACAAJ

  • Kato, T., & Floreano, D. (2001). An evolutionary active-vision system. In Proceedings of the 2001 congress on evolutionary computation (vol. 1, pp. 107–114). New York: IEEE. doi:10.1109/CEC.2001.934378.

    Google Scholar 

  • Keysers, C., & Gazzola, V. (2010). Social neuroscience: mirror neurons recorded in humans. Current Biology, 20, 353–354.

    Article  Google Scholar 

  • Langley, P., & Stromsten, S. (2000). Learning context-free grammars with a simplicity bias. In Proceedings of the 11th European conference on machine learning (pp. 321–338). Berlin: Springer.

    Google Scholar 

  • Lee, K., & Demiris, Y. (2011). Towards incremental learning of task-dependent action sequences using probabilistic parsing. In IEEE first joint international conference on development and learning and on epigenetic robotics (ICDL-EPIROB 2011). Germany: Frankfurt am Main

    Google Scholar 

  • Lee, K., Kim, T. K., Demiris, Y. (2012). Learning reusable task representations using hierarchical activity grammars with uncertainties. In IEEE international conference on robotics and automation (IEEE ICRA 2012). Minnesota: St. Paul.

    Google Scholar 

  • Liske, E. (1999). The hierarchical organiztion of mantid behaviours. In F. R. Prete, H. Wells, P. H. Wells, L. E. Hurd (Eds.), The praying mantids. Baltimore: Johns Hopkins University Press.

    Google Scholar 

  • Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of imaging understanding workshop (pp. 121–130). Darpa.

    Google Scholar 

  • Malcolm, G. L., & Henderson, J. M. (2010). Combining top-down processes to guide eye movements during real-world scene search. Journal of Vision, 10, 1–11.

    Article  Google Scholar 

  • Nehaniv, C., & Dautenhahn, K. (2002). The correspondence problems, Chap. 2 (pp. 41–61). Cambridge: MIT.

    Google Scholar 

  • Ognibene, D., Catenacci Volpi, N., Pezzulo, G. (2011). Learning to grasp information with your own hands. In Proceedings of 12th conference towards autonomous robotics systems (TAROS 2011). Berlin: Springer. http://springer.longhoe.net/book/10.1007/978-3-642-23232-9/page/1

  • Ognibene, D., Pezzulo, G., Baldassarre, G. (2010). How can bottom-up information shape learning of top-down attention control skills? In Proceedings of 9th international conference on development and learning. New York: IEEE.

    Google Scholar 

  • O’Regan, J. K., & Noé, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral Brain Science, 24(5), 939–973.

    Article  Google Scholar 

  • Pearl, J. (2000). Causality: models, reasoning and inference. Cambridge: Cambridge University Press.

    Google Scholar 

  • Pezzulo, G., Barca, L., Bocconi, A., Borghi, A. (2010). When affordances climb into your mind: advantages of motor simulation in a memory task performed by novice and expert rock climbers. Brain and Cognition, 73(1), 68–73.

    Article  Google Scholar 

  • Pezzulo, G., Barsalou, L. W., Cangelosi, A., Fischer, M. H., Spivey, M., McRae, K. (2011). The mechanics of embodiment: a dialogue on embodiment and computational modeling. Frontiers in Psychology, 2(00005).

    Google Scholar 

  • Rao, R. P., & Ballard, D. (1995). An active vision architecture based on iconic representations. Artificial Intelligence, 78(1–2), 461–505.

    Article  Google Scholar 

  • Reddy, L., & Kanwisher, N. (2006). Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4), 408–414.

    Article  Google Scholar 

  • Ryoo, M., & Aggarwal, J. (2006). Recognition of composite human activities through context-free grammar based representation. In IEEE computer society conference on computer vision and pattern recognition, 2006 (vol. 2, pp. 1709–1718). New York: IEEE.

    Google Scholar 

  • Sarabia, M., Ros, R., Demiris, Y. (2011). Towards an open-source social middleware for humanoid robots. In Proceedings of the IEEE/RAS international conference on humanoid robotics. New York: IEEE.

    Google Scholar 

  • Shanton, K., & Goldman, A. (2010). Simulation theory. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 527–538.

    Google Scholar 

  • Simmons, G., & Demiris, Y. (2006). Object gras** using the minimum variance model. Biological Cybernetics, 94(5), 393–407.

    Article  MathSciNet  MATH  Google Scholar 

  • Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society, 106(6), 467–482.

    Google Scholar 

  • Sutton, R. S., Precup, D., Singh, S. (1999). Between mdps and semi-mdps: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 211, 112–181.

    MathSciNet  Google Scholar 

  • Suzuki, M., & Floreano, D. (2006). Evolutionary active vision toward three dimensional landmark-navigation. In From animals to animats 9. Berlin: Springer. http://springer.longhoe.net/book/10.1007/11840541/page/1

  • Tate, A. (1977). Generating project networks. In Proceedings of the international joint conference on artificial intelligence (IJCAI-77) (pp. 888–893). Cambridge: Morgan Kaufmann.

    Google Scholar 

  • Tatler, B. W., Hayhoe, M. M., Land, M. F., Ballard, D. (2011). Eye guidance in natural vision: reinterpreting salience. Journal of Vision, 11(5), 1–23.

    Article  Google Scholar 

  • Theocharous, G., Murphy, K., Kaelbling, L. (2004). Representing hierarchical pomdps as dbns for multi-scale robot localization. In 2004 IEEE international conference on robotics and automation (ICRA) (vol. 1, pp. 1045–1051). New York: IEEE.

    Google Scholar 

  • Wu, Y., & Demiris, Y. (2010). Towards one shot learning by imitation for humanoid robots. In 2010 IEEE international conference on robotics and automation (ICRA) (pp. 2889–2894). New york: IEEE.

    Google Scholar 

Download references

Acknowledgments

This research has received funding from the European Union Seventh Framework Programme FP7/2007-2013, under grant agreement no. [270490]- [EFAA].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitri Ognibene .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ognibene, D., Wu, Y., Lee, K., Demiris, Y. (2013). Hierarchies for Embodied Action Perception. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39875-9_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39874-2

  • Online ISBN: 978-3-642-39875-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation