Abstract
Technological advancements have ushered in a new era of global educational development. Artificial Intelligence (AI) holds the potential to enhance teaching effectiveness and foster educational innovation. By utilizing student posture as a proxy, computer vision technology can accurately gauge levels of student engagement. While previous efforts have focused on refining posture classification models, this study uniquely addresses the comprehensive implementation of a real-time posture detection workflow, encompassing software, hardware, and network aspects. The proposed posture detection system leverages surveillance cameras equipped with cutting-edge computer vision technology, specifically employing the Open Visual Inference & Neural Network Optimization (Open VINO) model for precise student posture detection. Data transmission is facilitated using the Message Queuing Telemetry Transport (MQTT) protocol, effectively establishing a seamless posture detection workflow within the classroom setting. To validate the system, video recordings from a real teaching environment (a fifth-grade class in the Chinese compulsory education system) were analyzed, resulting in posture classifications with impressive accuracies of 0.933 for standing, 0.772 for sitting, and 0.959 for hand-raising. Achieving a frame processing time ranging from 109 to 758 milliseconds, the system efficiently delivers real-time posture data to educators. Consequently, the posture detection system developed in this study possesses the capability to intelligently monitor student postures in the classroom, with the potential to enhance teaching quality in smart classrooms.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12365-5/MediaObjects/10639_2023_12365_Fig6_HTML.png)
Similar content being viewed by others
References
Agahian, S., Negin, F., & Köse, C. (2019). Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition. The Visual Computer, 35, 591–607. https://doi.org/10.1007/s00371-018-1489-7.
Agahian, S., Negin, F., & Köse, C. (2020). An efficient human action recognition framework with pose-based spatiotemporal features. Engineering Science and Technology an International Journal, 23(1), 196–203. https://doi.org/10.1016/j.jestch.2019.04.014.
Althloothi, S., Mahoor, M. H., Zhang, X., & Voyles, R. M. (2014). Human activity recognition using multi-features and multiple kernel learning. Pattern Recognition, 47(5), 1800–1812. https://doi.org/10.1016/j.patcog.2013.11.032.
Böheim, R., Urdan, T., Knogler, M., & Seidel, T. (2020). Student hand-raising as an indicator of behavioral engagement and its role in classroom learning. Contemporary Educational Psychology, 62, Article 101894. https://doi.org/10.1016/j.cedpsych.2020.101894.
Cippitelli, E., Gasparrini, S., Gambi, E., & Spinsante, S. (2016). A human activity recognition system using skeleton data from RGBD sensors. Computational intelligence and neuroscience, 2016, Article 4351435. https://doi.org/10.1155/2016/4351435.
Corrin, L. (2021). Shifting to digital: A policy perspective on ‘Student perceptions of privacy principles for learning analytics’ (Ifenthaler & Schumacher 2016. Educational Technology Research and Development, 69(1), 353–356. https://doi.org/10.1007/s11423-020-09922-x.
Franco, A., Magnani, A., & Maio, D. (2020). A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognition Letters, 131, 293–299. https://doi.org/10.1016/j.patrec.2020.01.010.
Goda, K., & Mine, T. (2011). Analysis of students’ learning activities through quantifying time-series comments. Knowlege-Based and Intelligent Information and Engineering Systems: 15th International Conference KES 2011 Kaiserslautern Germany September 12–14 2011 Proceedings Part II, 15, 154–164. https://doi.org/10.1007/978-3-642-23863-5_16.
Guddeti, R. M. R. (2020). Automatic detection of students’ affective states in classroom environment using hybrid convolutional neural networks. Education and Information Technologies, 25(2), 1387–1415. https://doi.org/10.1007/s10639-019-10004-6.
Howell, J. A., Roberts, L. D., Seaman, K., & Gibson, D. C. (2018). Are we on our way to becoming a Helicopter University? Academics’ views on learning analytics. Technology Knowledge and Learning, 23(1), 1–20. https://doi.org/10.1007/s10758-017-9329-9.
Hu, J., & Haiying, Z. (2021). Recognition of classroom student state features based on deep learning algorithms and machine learning. Journal of Intelligent & Fuzzy Systems, 40(2), 2361–2372. https://doi.org/10.3233/JIFS-189232.
Jesna, J., Narayanan, A. S., & Bijlani, K. (2018). Automatic hand raise detection by analyzing the edge structures. Emerging Research in Computing, Information, Communication and Applications: ERCICA 2016, 171–180. https://doi.org/10.1007/978-981-10-4741-1_16.
Jia, J. G., Zhou, Y. F., Hao, X. W., Li, F., Desrosiers, C., & Zhang, C. M. (2020). Two-stream temporal convolutional networks for skeleton-based human action recognition. Journal of Computer Science and Technology, 35(3), 538–550. https://doi.org/10.1007/s11390-020-0405-6.
Jiang, D., Chen, Y., & Garg, A. (2018). A hybrid method for overlap** speech detection in classroom environment. Computer Applications in Engineering Education, 26(1), 171–180. https://doi.org/10.1002/cae.21855.
Keyvanpour, M. R., Vahidian, S., & Ramezani, M. (2020). HMR-vid: A comparative analytical survey on human motion recognition in video data. Multimedia Tools and Applications, 79(43), 31819–31863. https://doi.org/10.1007/s11042-020-09485-2.
Lei, F., Wei, Y., Hu, J., Yao, H., Deng, W., & Lu, Y. (2019). Student action recognition based on multiple features. 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 428–432. https://doi.org/10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00091.
Leng, L., Leng, R., Ma, Z., Gong, Y., & Wei, T. (2022). An automated object detection method for the attention of classroom and conference participants. Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021), 12167, 574–581. https://doi.org/10.1117/12.2628648. Article 121672B.
Li, W., Jiang, F., & Shen, R. (2019). Sleep gesture detection in classroom monitor system. ICASSP 2019–2019 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 7640–7644. https://doi.org/10.1109/ICASSP.2019.8683116.
Liao, W., Xu, W., Kong, S., Ahmad, F., & Liu, W. (2019). A two-stage method for hand-raising gesture recognition in classroom. Proceedings of the 2019 8th International Conference on Educational and Information Technology, 38–44. https://doi.org/10.1145/3318396.3318437.
Liu, Y. (2021). Exploring machine vision application in public art education system based on image processor. Microprocessors and Microsystems, 80, Article 103630. https://doi.org/10.1016/j.micpro.2020.103630.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.
Liu, X., Wang, X., & Ren, C. (2019). Research on intelligent campus monitoring management system based on deep neural network algorithm. Journal of Physics: Conference Series, 1237(2), 022143. https://doi.org/10.1088/1742-6596/1237/2/022143.
Mazzoli, E., Teo, W. P., Salmon, J., Pesce, C., He, J., Ben-Soussan, T. D., & Barnett, L. M. (2019). Associations of class-time sitting, step** and sit-to-stand transitions with cognitive functions and brain activity in children. International Journal of Environmental Research and Public Health, 16(9), 1482. https://doi.org/10.3390/ijerph16091482.
Mazzoli, E., Salmon, J., Teo, W. P., Pesce, C., He, J., Ben-Soussan, T. D., & Barnett, L. M. (2021). Breaking up classroom sitting time with cognitively engaging physical activity: Behavioural and brain responses. PLoS One, 16(7), e0253733. https://doi.org/10.1371/journal.pone.0253733.
Meng, F., Cheng, H., Zhuang, J., Li, K., & Sun, X. (2021). RMNet: Equivalently removing residual connection from networks. ar**v preprint ar**v:2111.00687. https://doi.org/10.48550/ar**v.2111.00687.
Pabba, C., & Kumar, P. (2022). An intelligent system for monitoring students’ engagement in large classroom teaching through facial expression recognition. Expert Systems, 39(1), https://doi.org/10.1111/exsy.12839. Article e12839.
Pennings, H. J. M., Tartwijk, J., Wubbels, T., Claessens, L. C. A., Want, A.C. v. d., & Brekelmans, M. (2014). Real-time teacher–student interactions: A dynamic systems approach. Teaching and teacher education, 37, 183–193. https://doi.org/10.1016/j.tate.2013.07.016.
Rashmi, M., Ashwin, T. S., & Guddeti, R. M. R. (2021). Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campus. Multimedia Tools and Applications, 80(2), 2907–2929. https://doi.org/10.1007/s11042-020-09741-5.
Roberts, L. D., Howell, J. A., Seaman, K., & Gibson, D. C. (2016). Student attitudes toward learning analytics in higher education: “The fitbit version of the learning world”. Frontiers in Psychology, 7, Article 1959. https://doi.org/10.3389/fpsyg.2016.01959.
Si, J., Lin, J., Jiang, F., & Shen, R. (2019). Hand-raising gesture detection in real classrooms using improved R-FCN. Neurocomputing, 359, 69–76. https://doi.org/10.1016/j.neucom.2019.05.031.
Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American Behavioral Scientist, 57(10), 1510–1529. https://doi.org/10.1177/0002764213479366.
Smith, K. C., Davoli, C. C. III, W. H. K., & Abrams, R. A. (2019). Standing enhances cognitive control and alters visual search. Attention Perception & Psychophysics, 81, 2320–2329. https://doi.org/10.3758/s13414-019-01723-6.
Sophokleous, A., Christodoulou, P., Doitsidis, L., & Chatzichristofis, S. A. (2021). Computer vision meets educational robotics. Electronics, 10(6), https://doi.org/10.3390/electronics10060730. Article 730.
Sun, R. C. F., & Shek, D. T. L. (2012). Classroom misbehavior in the eyes of students: A qualitative study. The scientific world journal, 2012, Article 398482. https://doi.org/10.1100/2012/398482.
Tang, J., Zhou, X., & Zheng, J. (2019). Design of intelligent classroom facial recognition based on deep learning. Journal of Physics: Conference Series, 1168(2), Article 022043. https://doi.org/10.1088/1742-6596/1168/2/022043.
Thomas, C., & Jayagopi, D. B. (2017). Predicting student engagement in classrooms using facial behavioral cues. Proceedings of the 1st ACM SIGCHI international workshop on multimodal interaction for education, 33–40. https://doi.org/10.1145/3139513.3139514.
Toolkit, O. (2023). Model: person-detection-action-recognition-0005. https://docs.openvinotoolkit.org/latest/omz_models_intel_person_detection_action_recognition_0005_description_person_detection_action_recognition_0005.html.
Villiers, B. D., & Werner, A. (2016). The relationship between student engagement and academic success. Journal for New Generation Sciences, 14(1), 36–50. https://doi.org/https://hdl.handle.net/10520/EJC-6ce55e9d0.
Wang, Z., Jiang, F., & Shen, R. (2019). An effective yawn behavior detection method in classroom. Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part I, 11953, 430–441. https://doi.org/10.1007/978-3-030-36708-4_35.
Wang, J., Tan, S., Zhen, X., Xu, S., Zheng, F., He, Z., & Shao, L. (2021). Deep 3D human pose estimation: A review. Computer Vision and Image Understanding, 210, Article 103225. https://doi.org/10.1016/j.cviu.2021.103225.
Wang, R., Liu, R., Li, Y., & Wang, X. (2022). Learning enriched global context information for human pose estimation. Neural Processing Letters, 54(3), 1663–1678. https://doi.org/10.1007/s11063-021-10699-0.
West, D., Huijser, H., & Heath, D. (2016). Putting an ethical lens on learning analytics. Educational Technology Research and Development, 64(5), 903–922. https://doi.org/10.1007/s11423-016-9464-3.
Xue, E., & Li, J. (2021). Standardization of compulsory schooling in China: Politics, practices, challenges and suggestions. Educational Philosophy and Theory, 54(12), 2108–2120. https://doi.org/10.1080/00131857.2021.1986696.
Yang, Y., & Guo, X. (2020). Universal basic education and the vulnerability to poverty: Evidence from compulsory education in rural China. Journal of the Asia Pacific Economy, 25(4), 611–633. https://doi.org/10.1080/13547860.2019.1699495.
Yu, M., Xu, J., Zhong, J., Liu, W., & Cheng, W. (2017). Behavior detection and analysis for learning process in classroom environment. 2017 IEEE Frontiers in Education Conference (FIE), 1–4. https://doi.org/10.1109/FIE.2017.8190635.
Zaletelj, J., & Košir, A. (2017). Predicting students’ attention in the classroom from Kinect facial and body features. EURASIP Journal on Image and Video Processing, (1), 1–12. https://doi.org/10.1186/s13640-017-0228-8. Article 80.
Zhang, Z. (2012). Microsoft Kinect sensor and its effect. IEEE Multimedia, 19(2), 4–10. https://doi.org/10.1109/MMUL.2012.24.
Zhang, Q., & Chen, Y. (2023). Spatial and contextual aware network based on multi-resolution for human pose estimation. The Visual Computer, 39(2), 651–662. https://doi.org/10.1007/s00371-021-02364-3.
Zhang, X., & Rozelle, S. (2022). Education universalization, rural school participation, and population density. China & World Economy, 30(4), 4–30. https://doi.org/10.1111/cwe.12426.
Zheng, R., Jiang, F., & Shen, R. (2020). Intelligent student behavior analysis system for real classrooms. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 9244–9248. https://doi.org/10.1109/ICASSP40776.2020.9053457.
Zheng, R., Jiang, F., & Shen, R. (2021). GestureDet: Real-time student gesture analysis with multi-dimensional attention-based detector. Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 680–686. https://doi.org/https://dl.acm.org/doi/abs/https://doi.org/10.5555/3491440.3491535.
Zhou, H., Jiang, F., & Shen, R. (2018). Who are raising their hands? Hand-raiser seeking based on object detection and pose estimation. Asian Conference on Machine Learning, 95, 470–485. https://doi.org/https://proceedings.mlr.press/v95/zhou18a.html.
Acknowledgements
This work was financially supported by 2022 Research Projects of the Centre for Future Education Research at the Southern University of Science and Technology (FE22Z004). The authors would like to express their gratitude to EditSprings (https://www.editsprings.cn) for the expert linguistic services provided.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, J., Zhou, D. A scalable real-time computer vision system for student posture detection in smart classrooms. Educ Inf Technol 29, 917–937 (2024). https://doi.org/10.1007/s10639-023-12365-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-12365-5