ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild

Luo, Yu; Ye, Jianbo; Adams, Reginald B.; Li, Jia; Newman, Michelle G.; Wang, James Z.

doi:10.1007/s11263-019-01215-y

ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild

Published: 31 August 2019

Volume 128, pages 1–25, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Yu Luo ORCID: orcid.org/0000-0001-7410-4417¹,
Jianbo Ye¹^nAff2,
Reginald B. Adams Jr.³,
Jia Li⁴,
Michelle G. Newman³ &
…
James Z. Wang¹

3174 Accesses
2 Altmetric
Explore all metrics

Abstract

Humans are arguably innately prepared to comprehend others’ emotional expressions from subtle body movements. If robots or computers can be empowered with this capability, a number of robotic applications become possible. Automatically recognizing human bodily expression in unconstrained situations, however, is daunting given the incomplete understanding of the relationship between emotional expressions and body movements. The current research, as a multidisciplinary effort among computer and information sciences, psychology, and statistics, proposes a scalable and reliable crowdsourcing approach for collecting in-the-wild perceived emotion data for computers to learn to recognize body languages of humans. To accomplish this task, a large and growing annotated dataset with 9876 video clips of body movements and 13,239 human characters, named Body Language Dataset (BoLD), has been created. Comprehensive statistical analysis of the dataset revealed many interesting insights. A system to model the emotional expressions based on bodily movements, named Automated Recognition of Bodily Expression of Emotion (ARBEE), has also been developed and evaluated. Our analysis shows the effectiveness of Laban Movement Analysis (LMA) features in characterizing arousal, and our experiments using LMA features further demonstrate computability of bodily expression. We report and compare results of several other baseline methods which were developed for action recognition based on two different modalities, body skeleton and raw image. The dataset and findings presented in this work will likely serve as a launchpad for future discoveries in body language understanding that will enable future robots to interact and collaborate more effectively with humans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Panel: Bodily Expressed Emotion Understanding Research: A Multidisciplinary Perspective

Construction and validation of the Dalian emotional movement open-source set (DEMOS)

Article 05 August 2022

Kinematic dataset of actors expressing emotions

Article Open access 08 September 2020

Notes

References

Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. ar**v preprint ar**v:1609.08675.
Aristidou, A., Charalambous, P., & Chrysanthou, Y. (2015). Emotion analysis and classification: understanding the performers’ emotions using the lma entities. Computer Graphics Forum, 34(6), 262–276.
Article Google Scholar
Aristidou, A., Zeng, Q., Stavrakis, E., Yin, K., Cohen-Or, D., Chrysanthou, Y., & Chen, B. (2017). Emotion control of unstructured dance movements. In: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation, article 9.
Aviezer, H., Trope, Y., & Todorov, A. (2012). Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science, 338(6111), 1225–1229.
Article Google Scholar
Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. Proceedings of the IEEE International Conference on Image Processing, pp. 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003.
Biel, J. I., & Gatica-Perez, D. (2013). The youtube lens: Crowdsourced personality impressions and audiovisual analysis of vlogs. IEEE Transactions on Multimedia, 15(1), 41–55.
Article Google Scholar
Caba Heilbron, F., Escorcia, V., Ghanem, B., & Carlos Niebles, J. (2015). Activitynet: A large-scale video benchmark for human activity understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–970.
Cao, Z., Simon, T., Wei, S.E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299.
Carmichael, L., Roberts, S., & Wessell, N. (1937). A study of the judgment of manual expression as presented in still and motion pictures. The Journal of Social Psychology, 8(1), 115–142.
Article Google Scholar
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 4724–4733.
Noroozi, F., Kaminska, D., Corneanu, C., Sapinski, T., Escalera, S., & Anbarjafari, G. (2018). Survey on emotional body gesture recognition. Journal Of IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2018.2874986.
Dael, N., Mortillaro, M., & Scherer, K. R. (2012). Emotion expression in body action and posture. Emotion, 12(5), 1085.
Article Google Scholar
Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. In: Proceedings of the European Conference on Computer Vision, Springer, pp. 428–441.
Datta, R., Joshi, D., Li, J., & Wang, J.Z. (2006). Studying aesthetics in photographic images using a computational approach. In: European conference on computer vision, Springer, pp. 288–301.
Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the em algorithm. Applied Statistics, 28, 20–28.
Article Google Scholar
De Gelder, B. (2006). Towards the neurobiology of emotional body language. Nature Reviews Neuroscience, 7(3), 242–249.
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.
Douglas-Cowie, E., Cowie, R., Sneddon, I., Cox, C., Lowry, L., McRorie, M., Martin, L.J.C., Devillers, J., Abrilian, A., & Batliner, S., et al. (2007). The humaine database: addressing the needs of the affective computing community. In: Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pp. 488–500.
Ekman, P. (1992). Are there basic emotions? Psychological Review, 99(3), 550–553.
Article Google Scholar
Ekman, P. (1993). Facial expression and emotion. American Psychologist, 48(4), 384.
Article Google Scholar
Ekman, P., & Friesen, W. V. (1977). Facial Action Coding System: A technique for the measurement of facial movement. Palo Alto: Consulting Psychologists Press, Stanford University.
Google Scholar
Ekman, P., & Friesen, W. V. (1986). A new pan-cultural facial expression of emotion. Motivation and Emotion, 10(2), 159–168.
Article Google Scholar
Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.
Article MathSciNet Google Scholar
Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A.M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5562–5570.
Gu, C., Sun, C., Ross, D.A., Vondrick, C., Pantofaru, C., Li, Y., Vijayanarasimhan, S., Toderici, G., Ricco, S., Sukthankar, R., et al. (2018). Ava: A video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6047–6056.
Gunes, H., & Piccardi, M. (2005). Affect recognition from face and body: early fusion vs. late fusion. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 4, 3437–3443.
Google Scholar
Gunes, H., & Piccardi, M. (2007). Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications, 30(4), 1334–1345.
Article Google Scholar
Gwet, K.L. (2014). Handbook of Inter-rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Advanced Analytics, LLC.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.
Iqbal, U., Milan, A., & Gall, J. (2017). Posetrack: Joint multi-person pose estimation and tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2011–2020.
Kantorov, V., & Laptev, I. (2014). Efficient feature extraction, encoding and classification for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2593–2600.
Karg, M., Samadani, A. A., Gorbet, R., Kühnlenz, K., Hoey, J., & Kulić, D. (2013). Body movements for affective expression: A survey of automatic recognition and generation. IEEE Transactions on Affective Computing, 4(4), 341–359.
Article Google Scholar
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., & Natsev, P., et al. (2017). The kinetics human action video dataset. ar**v preprint ar**v:1705.06950.
Kipf, T.N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. ar**v preprint ar**v:1609.02907.
Kleinsmith, A., & Bianchi-Berthouze, N. (2013). Affective body expression perception and recognition: A survey. IEEE Transactions on Affective Computing, 4(1), 15–33.
Article Google Scholar
Kleinsmith, A., De Silva, P. R., & Bianchi-Berthouze, N. (2006). Cross-cultural differences in recognizing affect from body posture. Interacting with Computers, 18(6), 1371–1389.
Article Google Scholar
Kleinsmith, A., Bianchi-Berthouze, N., & Steed, A. (2011). Automatic recognition of non-acted affective postures. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(4), 1027–1038.
Article Google Scholar
Kosti, R., Alvarez, J.M., Recasens, A., & Lapedriza, A. (2017). Emotion recognition in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1667–1675.
Krakovsky, M. (2018). Artificial (emotional) intelligence. Communications of the ACM, 61(4), 18–19.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105.
Laban, R., & Ullmann, L. (1971). The Mastery of Movement. Bingley: ERIC.
Google Scholar
Lu, X., Suryanarayan, P., Adams Jr, R.B., Li, J., Newman, M.G., & Wang, J.Z. (2012). On shape and the computability of emotions. In: Proceedings of the ACM International Conference on Multimedia, ACM, pp. 229–238.
Lu, X., Adams Jr, R.B., Li, J., Newman, M.G., Wang, J.Z. (2017). An investigation into three visual characteristics of complex scenes that evoke human emotion. In: Proceedings of the International Conference on Affective Computing and Intelligent Interaction, pp. 440–447.
Luvizon, D.C., Picard, D., & Tabia, H. (2018). 2d/3d pose estimation and action recognition using multitask deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137–5146.
Martinez, J., Hossain, R., Romero, J., & Little, J.J. (2017). A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649.
Meeren, H. K., van Heijnsbergen, C. C., & de Gelder, B. (2005). Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences of the United States of America, 102(45), 16518–16523.
Article Google Scholar
Mehrabian, A. (1980). Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies. Cambridge: The MIT Press.
Google Scholar
Mehrabian, A. (1996). Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14(4), 261–292.
Article MathSciNet Google Scholar
Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92–105.
Article Google Scholar
Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8.
Potapov, D., Douze, M., Harchaoui, Z., & Schmid, C. (2014). Category-specific video summarization. In: European Conference on Computer Vision, Springer, pp. 540–555.
Ruggero Ronchi, M., & Perona, P. (2017). Benchmarking and error diagnosis in multi-instance pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 369–378.
Schindler, K., Van Gool, L., & de Gelder, B. (2008). Recognizing emotions expressed by body pose: A biologically inspired neural model. Neural Networks, 21(9), 1238–1246.
Article Google Scholar
Shiffrar, M., Kaiser, M.D., & Chouchourelou, A. (2011). Seeing human movement as inherently social. The Science of Social Vision, pp. 248–264.
Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576.
Soomro, K., Zamir, A.R., Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. ar**v preprint ar**v:1212.0402v1.
Thomee, B., Shamma, D. A., Friedland, G., Elizalde, B., Ni, K., Poland, D., et al. (2016). Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2), 64–73.
Article Google Scholar
Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., et al. (2014). Xsede: Accelerating scientific discovery. Computing in Science & Engineering, 16(5), 62–74.
Article Google Scholar
Wakabayashi, A., Baron-Cohen, S., Wheelwright, S., Goldenfeld, N., Delaney, J., Fine, D., et al. (2006). Development of short forms of the empathy quotient (eq-short) and the systemizing quotient (sq-short). Personality and Individual Differences, 41(5), 929–940.
Article Google Scholar
Wallbott, H. G. (1998). Bodily expression of emotion. European Journal of Social Psychology, 28(6), 879–896.
Article Google Scholar
Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558.
Wang, H., Kläser, A., Schmid, C., & Liu, C.L. (2011). Action recognition by dense trajectories. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176.
Wang, L., **ong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision, Springer, pp. 20–36.
Xu, F., Zhang, J., & Wang, J. Z. (2017). Microexpression identification and categorization using a facial dynamics map. IEEE Transactions on Affective Computing, 8(2), 254–267.
Article Google Scholar
Yan, S., **ong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence.
Ye, J., Li, J., Newman, M. G., Adams, R. B., & Wang, J. Z. (2019). Probabilistic multigraph modeling for improving the quality of crowdsourced affective data. IEEE Transactions on Affective Computing, 10(1), 115–128.
Article Google Scholar
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In: Proceedings of the Joint Pattern Recognition Symposium, Springer, pp. 214–223.

Download references

Acknowledgements

This material is based upon work supported in part by The Pennsylvania State University. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant No. ACI-1548562 (Towns et al. 2014). The work was also supported through a GPU gift from the NVIDIA Corporation. The authors are grateful to the thousands of Amazon Mechanical Turk independent contractors for their time and dedication in providing invaluable emotion ground truth labels for the video collection. Hanjoo Kim contributed in some of the discussions. Jeremy Yuya Ong supported the data collection and visualization effort. We thank Amazon.com, Inc. for supporting the expansion of this line of research.

Author information

Jianbo Ye
Present address: Amazon Lab126, Sunnyvale, CA, USA

Authors and Affiliations

College of Information Sciences and Technology, The Pennsylvania State University, University Park, PA, USA
Yu Luo, Jianbo Ye & James Z. Wang
Department of Psychology, The Pennsylvania State University, University Park, PA, USA
Reginald B. Adams Jr. & Michelle G. Newman
Department of Statistics, The Pennsylvania State University, University Park, PA, USA
Jia Li

Authors

Yu Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jianbo Ye
View author publications
You can also search for this author in PubMed Google Scholar
Reginald B. Adams Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Michelle G. Newman
View author publications
You can also search for this author in PubMed Google Scholar
James Z. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yu Luo or James Z. Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, Y., Ye, J., Adams, R.B. et al. ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild. Int J Comput Vis 128, 1–25 (2020). https://doi.org/10.1007/s11263-019-01215-y

Download citation

Received: 01 May 2018
Accepted: 14 August 2019
Published: 31 August 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s11263-019-01215-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Panel: Bodily Expressed Emotion Understanding Research: A Multidisciplinary Perspective

Construction and validation of the Dalian emotional movement open-source set (DEMOS)

Kinematic dataset of actors expressing emotions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ARBEE: Towards Automated Recognition of Bodily Expression of Emotion in the Wild

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Panel: Bodily Expressed Emotion Understanding Research: A Multidisciplinary Perspective

Construction and validation of the Dalian emotional movement open-source set (DEMOS)

Kinematic dataset of actors expressing emotions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation