Abstract
The rise of massive open online courses (MOOCs) brings rich opportunities for understanding learners' experiences based on analyzing learner-generated content such as course reviews. Traditionally, the unstructured textual data is analyzed qualitatively via manual coding, thus failing to offer a timely understanding of the learner’s experiences. To address this problem, this study explores the ability of deep neural networks (DNNs) to classify the semantic content of course review data automatically. Based on 102,184 reviews from 401 MOOCs collected from the Class Central, the present study developed DNN-empowered models to automatically distinguish a group of semantic categories. Results showed that DNNs, especially recurrent convolutional neural networks (RCNNs), achieve acceptable performance in capturing and learning features of course review texts for understanding their semantic meanings. By dramatically lightening the coding workload and enhancing analysis efficiency, the RCNN classifier proposed in this study allows timely feedback about learners’ experiences, based on which course providers and designers can develop suitable interventions to promote MOOC instructional design.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-11980-6/MediaObjects/10639_2023_11980_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-11980-6/MediaObjects/10639_2023_11980_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-11980-6/MediaObjects/10639_2023_11980_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-11980-6/MediaObjects/10639_2023_11980_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-11980-6/MediaObjects/10639_2023_11980_Fig4_HTML.png)
Similar content being viewed by others
Data availability
The data are available upon reasonable request from the corresponding author.
References
Adam, T. (2019). Digital neocolonialism and massive open online courses (MOOCs): Colonial pasts and neoliberal futures. Learning, Media and Technology, 44(3), 365–380. https://doi.org/10.1080/17439884.2019.1640740
Adnan, M., Habib, A., Ashraf, J., Mussadiq, S., Raza, A. A., Abid, M., ... & Khan, S. U. (2021). Predicting at-risk students at different percentages of course length for early intervention using machine learning models. IEEE Access, 9, 7519–7539. https://doi.org/10.1109/ACCESS.2021.3049446
Albelbisi, N., Yusop, F. D., & Salleh, U. K. M. (2018). Map** the factors influencing success of massive open online courses (MOOC) in higher education. EURASIA Journal of Mathematics, Science and Technology Education, 14(7), 2995–3012. https://doi.org/10.29333/ejmste/91486
Anders, A. (2015). Theories and applications of massive online open courses (MOOCs): the case for hybrid design. International Review of Research in Open and Distributed Learning, 16(6), 39–61. https://doi.org/10.19173/irrodl.v16i6.2185
Appelbaum, S. H. (1997). Socio-technical systems theory: An intervention strategy for organizational development. Management Decision, 35(6), 452–463. https://doi.org/10.1108/00251749710173823
Blanchard, N., Brady, M., Olney, A. M., Glaus, M., Sun, X., Nystrand, M., ... & D’Mello, S. (2015). A study of automatic speech recognition in noisy classroom environments for automated dialog analysis. In International Conference on Artificial Intelligence in Education (pp. 23–33). Springer. https://doi.org/10.1007/978-3-319-19773-9_3
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the precision-recall curve: point estimates and confidence intervals. In Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part III (pp. E1-E1). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-40994-3_29
Brahimi, T., & Sarirete, A. (2015). Learning outside the classroom through MOOCs. Computers in Human Behavior, 51, 604–609. https://doi.org/10.1016/j.chb.2015.03.013
Bralić, A., & Divjak, B. (2018). Integrating MOOCs in traditionally taught courses: Achieving learning outcomes with blended learning. International Journal of Educational Technology in Higher Education, 15(1), 1–16. https://doi.org/10.1186/s41239-017-0085-7
Capuano, N., Caballé, S., Conesa, J., & Greco, A. (2021). Attention-based hierarchical recurrent neural networks for MOOC forum posts analysis. Journal of Ambient Intelligence and Humanized Computing, 12(11), 9977–9989. https://doi.org/10.1007/s12652-020-02747-9
Chen, X., Cheng, G., **e, H., Chen, G., & Zou, D. (2021). Understanding MOOC reviews: Text mining using structural topic model. Human-Centric Intelligent Systems, 1(3–4), 55–65. https://doi.org/10.2991/hcis.k.211118.001
Chen, X., Wang, F. L., Cheng, G., Chow, M.-K., & **e, H. (2022). Understanding learners’ perception of MOOCs based on review data aanalysis using deep learning and sentiment analysis. Future Internet, 14(8), 218. https://doi.org/10.3390/fi14080218
Chen, X., Zou, D., **e, H., & Cheng, G. (2020). What are MOOCs learners’ concerns? Text analysis of reviews for computer science courses. In International Conference on Database Systems for Advanced Applications (pp. 73–79). Springer. https://doi.org/10.1007/978-3-030-59413-8_6
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017). Convolutional recurrent neural networks for music classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 2392–2396). IEEE. https://doi.org/10.48550/ar**v.1609.04243
Chou, H. L., & Chen, C. H. (2016). Beyond identifying privacy issues in e-learning settings–implications for instructional designers. Computers & Education, 103, 124–133. https://doi.org/10.1016/j.compedu.2016.10.002
Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 670–680). ACL. https://doi.org/10.48550/ar**v.1705.02364
Creswell, J. W., & Creswell, J. D. (2017). Research design: qualitative, quantitative, and mixed methods approaches. Sage publications.
Cui, Y., **, W. Q., & Wise, A. F. (2017). Humans and machines together: improving characterization of large scale online discussions through dynamic interrelated post and thread categorization (DIPTiC). In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale (pp. 217–219). https://doi.org/10.1145/3051457.3053989
Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (pp. 233–240). https://doi.org/10.1145/1143844.1143874
Deng, R., Benckendorff, P., & Gannaway, D. (2020). Linking learner factors, teaching context, and engagement patterns with MOOC learning outcomes. Journal of Computer Assisted Learning, 36(5), 688–708. https://doi.org/10.1111/jcal.12437
Doleck, T., Lemay, D. J., Basnet, R. B., & Bazelais, P. (2020). Predictive analytics in education: A comparison of deep learning frameworks. Education and Information Technologies, 25(3), 1951–1963. https://doi.org/10.1007/s10639-019-10068-4
Downes, S. (2022). Connectivism. Asian Journal of Distance Education, 17(1), 58–87. Retrieved from https://asianjde.com/ojs/index.php/AsianJDE/article/view/623. Accessed 1 Aug 2022.
Ezen-Can, A., Boyer, K. E., Kellogg, S., & Booth, S. (2015). Unsupervised modeling for understanding MOOC discussion forums: a learning analytics approach. In Proceedings of the fifth International Conference on Learning Analytics and Knowledge (pp. 146–150). https://doi.org/10.1145/2723576.2723589
Fiorella, L., & Mayer, R. E. (2016). Eight ways to promote generative learning. Educational Psychology Review, 28(4), 717–741. https://doi.org/10.1007/s10648-015-9348-9
Gameel, B. G. (2017). Learner satisfaction with massive open online courses. American Journal of Distance Education, 31(2), 98–111. https://doi.org/10.1080/08923647.2017.1300462
García-Peñalvo, F. J., Fidalgo-Blanco, Á., & Sein-Echaluce, M. L. (2018). An adaptive hybrid MOOC model: Disrupting the MOOC concept in higher education. Telematics and Informatics, 35(4), 1018–1030. https://doi.org/10.1016/j.tele.2017.09.012
Gomez-Arizaga, M. P., Bahar, A. K., Maker, C. J., Zimmerman, R., & Pease, R. (2016). How does science learning occur in the classroom? Students’ perceptions of science instruction during the implementation of the REAPS model. Eurasia Journal of Mathematics, Science and Technology Education, 12(3), 431–455. https://doi.org/10.12973/eurasia.2016.1209a
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Hansch, A., Hillers, L., McConachie, K., Newman, C., Schildhauer, T., & Schmidt, J. P. (2015). Video and online learning: critical reflections and findings from the field. HIIG Discussion Paper Series No. 2015–02, Retrieved from: https://ssrn.com/abstract=2577882. Accessed 1 Aug 2022.
Hew, K. F. (2015). Towards a model of engaging online students: Lessons from MOOCs and four policy documents. International Journal of Information and Education Technology, 5(6), 425–431. https://doi.org/10.7763/IJIET.2015.V5.543
Hew, K. F. (2016). Promoting engagement in online courses: What strategies can we learn from three highly rated MOOCS. British Journal of Educational Technology, 47(2), 320–341. https://doi.org/10.1111/bjet.12235
Hew, K. F., Qiao, C., & Tang, Y. (2018). Understanding student engagement in large-scale open online courses: a machine learning facilitated analysis of student’s reflections in 18 highly rated MOOCs. International Review of Research in Open and Distributed Learning, 19(3), 69–93. https://doi.org/10.19173/irrodl.v19i3.3596
Hew, K. F., Hu, X., Qiao, C., & Tang, Y. (2020). What predicts student satisfaction with MOOCs: a gradient boosting trees supervised machine learning and sentiment analysis approach. Computers & Education, 145, 103724. https://doi.org/10.1016/j.compedu.2019.103724
Hone, K. S., & El Said, G. R. (2016). Exploring the factors affecting MOOC retention: A survey study. Computers & Education, 98, 157–168. https://doi.org/10.1016/j.compedu.2016.03.016
Huang, X., Chandra, A., DePaolo, C. A., & Simmons, L. L. (2016). Understanding transactional distance in web-based learning environments: An empirical study. British Journal of Educational Technology, 47(4), 734–747. https://doi.org/10.1111/bjet.12263
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 2018, 6347186. https://doi.org/10.1155/2018/6347186
Jha, A., & Mamidi, R. (2017). When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data. In Proceedings of the Second Workshop on NLP and Computational Social Science (pp. 7–16). ACL. https://doi.org/10.18653/v1/W17-2902
Joulin, A., Grave, É., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (pp. 427–431). ACL. https://doi.org/10.48550/ar**v.1607.01759
Kaushik, V., & Walsh, C. A. (2019). Pragmatism as a research paradigm and its implications for social work research. Social Sciences, 8(9), 255. https://doi.org/10.3390/socsci8090255
Koul, A., Becchio, C., & Cavallo, A. (2018). Cross-validation approaches for replicability in psychology. Frontiers in Psychology, 9, 1117. https://doi.org/10.3389/fpsyg.2018.01117
Kurucay, M., & Inan, F. A. (2017). Examining the effects of learner-learner interactions on satisfaction and learning in an online undergraduate course. Computers & Education, 115, 20–37. https://doi.org/10.1016/j.compedu.2017.06.010
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent convolutional neural networks for text classification. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 2267–2273). AAAI Press. https://doi.org/10.1609/aaai.v29i1.9513
Lee, J. (2018). The effects of knowledge sharing on individual creativity in higher education institutions: Socio-technical view. Administrative Sciences, 8(2), 21. https://doi.org/10.3390/admsci8020021
Liang, M., & Hu, X. (2015). Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3367–3375). https://doi.org/10.1109/CVPR.2015.7298958
Lin, Z., Feng, M., Santos, C. N. dos, Yu, M., **ang, B., Zhou, B., & Bengio, Y. (2017a). A structured self-attentive sentence embedding. In Proceedings of the 5th International Conference on Learning Representations (pp. 1–15). https://doi.org/10.48550/ar**v.1703.03130
Lin, Z., Feng, M., Santos, C. N. D., Yu, M., **ang, B., Zhou, B., & Bengio, Y. (2017b). A structured self-attentive sentence embedding. https://doi.org/10.48550/ar**v.1703.03130
Liu, S., Peng, X., Cheng, H. N. H., Liu, Z., Sun, J., & Yang, C. (2019). Unfolding sentimental and behavioral tendencies of learners’ concerned topics from course reviews in a MOOC. Journal of Educational Computing Research, 57(3), 670–696. https://doi.org/10.1177/0735633118757181
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (pp. 2873–2879). ACM. https://doi.org/10.48550/ar**v.1605.05101
Mahmood, Z., Safder, I., Nawab, R. M. A., Bukhari, F., Nawaz, R., Alfakeeh, A. S., Aljohani, N. R., & Hassan, S.-U. (2020). Deep sentiments in Roman Urdu text using recurrent convolutional neural network model. Information Processing & Management, 57(4), 102233. https://doi.org/10.1016/j.ipm.2020.102233
Major, C. H., & Blackmon, S. J. (2016). Massive open online courses: Variations on a new instructional form. New Directions for Institutional Research, 2015(167), 11–25. https://doi.org/10.1002/ir.20151
Milligan, C., & Littlejohn, A. (2017). Why study on a MOOC? The motives of students and professionals. International Review of Research in Open and Distributed Learning, 18(2), 92–102. https://doi.org/10.19173/irrodl.v18i2.3033
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., & Gao, J. (2020). Deep learning based text classification: a comprehensive review. ACM Computing Surveys (CSUR), 54(3), 1–40. https://doi.org/10.48550/ar**v.2004.03705
Moore, M. G. (1991). Distance education theory. Taylor & Francis.
Moore, M. G. (1993). Theory of transactional distance. Theoretical Principles of Distance Education, 1, 22–38.
Mubarak, A. A., Cao, H., & Ahmed, S. A. M. (2021). Predictive learning analytics using deep learning model in MOOCs’ courses videos. Education and Information Technologies, 26(1), 371–392. https://doi.org/10.1007/s10639-020-10273-6
Na, I. S., Tran, C., Nguyen, D., & Dinh, S. (2020). Facial UV map completion for pose-invariant face recognition: A novel adversarial approach based on coupled attention residual UNets. Human-Centric Computing and Information Sciences, 10(1), 1–17. https://doi.org/10.1186/s13673-020-00250-w
Nilashi, M., Abumalloh, R. A., Zibarzani, M., Samad, S., Zogaan, W. A., Ismail, M. Y., Mohd, S., & Akib, N. A. M. (2022). What factors influence students satisfaction in massive open online courses? Findings from user-generated content using educational data mining. Education and Information Technologies, 1–35. https://doi.org/10.1007/s10639-022-10997-7
Paul, D., Li, F., Teja, M. K., Yu, X., & Frost, R. (2017). Compass: spatio temporal sentiment analysis of US election what twitter says! In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1585–1594). ACM. https://doi.org/10.1145/3097983.3098053
Peng, X., & Xu, Q. (2020). Investigating learners’ behaviors and discourse content in MOOC course reviews. Computers & Education, 143, 103673. https://doi.org/10.1016/j.compedu.2019.103673
Peng, X., Han, C., Ouyang, F., & Liu, Z. (2020a). Topic tracking model for analyzing student-generated posts in SPOC discussion forums. International Journal of Educational Technology in Higher Education, 17(1), 1–22. https://doi.org/10.1186/s41239-020-00211-4
Peng, Z., Yan, G., Zhongshan, Q., Huiyong, L., Mouying, L., & Shengnan, L. (2020b). CIM/G graphics automatic generation in substation primary wiring diagram based on image recognition. Journal of Physics: Conference Series, 1617(1), 12007. https://doi.org/10.1088/1742-6596/1617/1/012007
Qiao, C., Huang, B., Niu, G., Li, D., Dong, D., He, W., Yu, D., & Wu, H. (2018). A new method of region embedding for text classification. In International Conference on Learning Representations (pp. 1–12). Vancouver, Canada: HSE Publishing. Retrieved from https://openreview.net/pdf?id=BkSDMA36Z. Accessed 1 Aug 2022.
Rhoads, R. A., Camacho, M. S., Toven-Lindsey, B., & Lozano, J. B. (2015). The massive open online course movement, xMOOCs, and faculty labor. The Review of Higher Education, 38(3), 397–424. https://doi.org/10.1353/rhe.2015.0016
Rieber, L. P. (2017). Participation patterns in a massive open online course (MOOC) about statistics. British Journal of Educational Technology, 48(6), 1295–1304. https://doi.org/10.1111/bjet.12504
Rosenthal, S., Atanasova, P., Karadzhov, G., Zampieri, M., & Nakov, P. (2021). SOLID: a large-scale semi-supervised dataset for offensive language identification. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 915–928). ACL. https://doi.org/10.48550/ar**v.2004.14454
Rospocher, M. (2022). On exploiting transformers for detecting explicit song lyrics. Entertainment Computing, 43, 100508. https://doi.org/10.1016/j.entcom.2022.100508
Salehinejad, H., Sankar, S., Barfett, J., Colak, E., & Valaee, S. (2017). Recent advances in recurrent neural networks. https://doi.org/10.48550/ar**v.1801.01078
Schmitt, M., Steinheber, S., Schreiber, K., & Roth, B. (2018). Joint aspect and polarity classification for aspect-based sentiment analysis with end-to-end neural networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 1109–1114). ACL. https://doi.org/10.48550/ar**v.1808.09238
Sha, L., Rakovic, M., Whitelock-Wainwright, A., Carroll, D., Yew, V. M., Gasevic, D., & Chen, G. (2021). Assessing algorithmic fairness in automatic classifiers of educational forum posts. In International Conference on Artificial Intelligence in Education (pp. 381–394). Springer. https://doi.org/10.1007/978-3-030-78292-4_31
Shearer, R. L., & Park, E. (2019). The theory of transactional distance. In I. Jung (Ed.), Open and distance education theory revisited. Springer. https://doi.org/10.1007/978-981-13-7740-2_4
Shukor, N. A., & Abdullah, Z. (2019). Using learning analytics to improve MOOC instructional design. International Journal of Emerging Technologies in Learning (IJET), 14(24), 6–17. https://doi.org/10.3991/ijet.v14i24.12185
Song, Y., Lei, S., Hao, T., Lan, Z., & Ding, Y. (2021). Automatic classification of semantic content of classroom dialogue. Journal of Educational Computing Research, 59(3), 496–521. https://doi.org/10.1177/07356331209685
Sun, Y., Ni, L., Zhao, Y., Shen, X., & Wang, N. (2019b). Understanding students’ engagement in MOOCs: An integration of self-determination theory and theory of relationship quality. British Journal of Educational Technology, 50(6), 3156–3174. https://doi.org/10.1111/bjet.12724
Sun, X., Guo, S., Gao, Y., Zhang, J., **ao, X., & Feng, J. (2019a). Identification of urgent posts in MOOC discussion forums using an improved RCNN. In 201 IEEE World Conference on Engineering Education (EDUNINE) (pp. 1–5). IEEE. https://doi.org/10.1109/EDUNINE.2019.8875845
Tang, D., Qin, B., & Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1422–1432). ACL. https://doi.org/10.18653/v1/D15-1167
Terras, M. M., & Ramsay, J. (2015). Massive open online courses (MOOCs): Insights and challenges from a psychological perspective. British Journal of Educational Technology, 46(3), 472–487. https://doi.org/10.1111/bjet.12274
Vygotsky, L. (1978). Interaction between learning and development. In M. Cole (Ed.), Mind in society: the development of higher psychological processes. Harvard University Press.
Winne, P. H. (2010). Improving measurement of self-regulated learning. Educational Psychologist, 45(4), 267–276. https://doi.org/10.1080/00461520.2010.517150
Wise, A., & Cui, Y. (2018). Envisioning a learning analytics for the learning sciences. In Proceedings of the 13th International Conference of the Learning Sciences (pp. 1799–1806). International Society of the Learning Sciences. https://doi.org/10.22318/cscl2018.1799
Wise, A. F., & Schwarz, B. B. (2017). Visions of CSCL: Eight provocations for the future of the field. International Journal of Computer-Supported Collaborative Learning, 12(4), 423–467. https://doi.org/10.1007/s11412-017-9267-5
**ao, J. (2017). Learner-content interaction in distance education: The weakest link in interaction research. Distance Education, 38(1), 123–135. https://doi.org/10.1080/01587919.2017.1298982
**ong, Y., Li, H., Kornhaber, M. L., Suen, H. K., Pursel, B., & Goins, D. D. (2015). Examining the relations among student motivation, engagement, and retention in a MOOC: a structural equation modeling approach. Global Education Review, 2(3), 23–33. Retrieved from: https://files.eric.ed.gov/fulltext/EJ1074099.pdf. Accessed 1 Aug 2022.
Xu, Y., & Lynch, C. F. (2018). What do you want? Applying deep learning models to detect question topics in MOOC forum posts. In Wood-stock’18: ACM Symposium on Neural Gaze Detection (pp. 1–6). https://doi.org/10.1145/1122445.1122456
Yan, Q. (2021). A video production method of microclass combined with MOOC. Scientific Programming, 2021, 9925165. https://doi.org/10.1155/2021/9925165
Yan, Y., Wang, Y., Gao, W.-C., Zhang, B.-W., Yang, C., & Yin, X.-C. (2018). LSTM: Multi-label ranking for document classification. Neural Processing Letters, 47(1), 117–138. https://doi.org/10.1007/s11063-017-9636-0
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480–1489). ACM. https://doi.org/10.18653/v1/N16-1174
Yousef, A. M. F., Chatti, M. A., Wosnitza, M., & Schroeder, U. (2015). A cluster analysis of MOOC stakeholder perspectives. International Journal of Educational Technology in Higher Education, 12(1), 74–90. https://doi.org/10.7238/rusc.v12i1.2253
Zhang, X., & LeCun, Y. (2017). Which encoding is the best for text classification in chinese, english, japanese and korean? https://doi.org/10.48550/ar**v.1708.02657
Zhang, Y., & Wallace, B. C. (2017). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 253–263). https://doi.org/10.48550/ar**v.1510.03820
Zheng, S., Rosson, M. B., Shih, P. C., & Carroll, J. M. (2015). Understanding student motivation, behaviors and perceptions in MOOCs. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (pp. 1882–1895). https://doi.org/10.1145/2675133.2675217
Zhou, C., Sun, C., Liu, Z., & Lau, F. (2015). A C-LSTM neural network for text classification. https://doi.org/10.48550/ar**v.1511.08630
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: The technical details about the DNNs used for training and testing classifiers for review topic classification
Appendix: The technical details about the DNNs used for training and testing classifiers for review topic classification
FastText
FastText, as a fast algorithm (Joulin et al., 2017) for learning text representations for classification, is an implementation of word2vec, which adopts a skip-gram model for the identification of relationships between words through word-usage pattern analysis. FastText is also regarded as a sub-word model capable of integrating sub-word information into embedded learning (Schmitt et al., 2018). Its model architecture consists of hierarchical softmax and feature hashing. Briefly, FastText “takes the sentence represented as n-gram features and embeds these features using an embedding layer”, and “the embeddings of the n-gram features are then averaged to form the final representation of the sentence and are projected onto the output layer” (Paul et al., 2017, p. 1589). Given that with many categories, it would be computationally expensive to train linear classifiers, to reduce efficiency, Joulin et al. (2017) utilized a hierarchical softmax with the basis of the Huffman coding tree. The hierarchical softmax is “advantageous at test time when searching for the most likely class” (Joulin et al., 2017, p. 428). Each node is related to a likelihood indicating the possibility of the path from the root to the node. In other words, the possibility of a node ought to be lower than that of its parent. The exploration of “the tree with a depth first search and tracking the maximum probability among the leaves allows [users] to discard any branch associated with a small probability” (Joulin et al., 2017, p. 428).
FastText uses a shallow neural algorithm for text classification just like the continuous bag-of-words approach. However, rather than predicting the word based on its neighbors, FastText predicts the target label based on the sample’s words (Rosenthal et al., 2021). FastText adopts “a bag of \(n\)-grams as additional features to capture some partial information” (Minaee et al., 2020, p. 4) about the local word order and then transforms them into low dimensional vector space. In this way, the features can be shared across different categories regardless of their lexical differences. Thus, it can “not only learn similar embeddings for word forms sharing a common stem but also generate embeddings for unseen words in the test set by combining the learned character n-gram embeddings” (Schmitt et al., 2018, p. 1111). Because of this unique feature, FastText is notably efficient in practical usage and can also achieve comparable performance to approaches with explicit word order usage (Rospocher, 2022), for example, the expensive computation of bag-of-words with additional consideration of word order.
FastText also enables word vector updating via back-propagation during model training by allowing the model to fine-tune word representations following the target task (Bojanowski et al., 2017). Because of the efficient implementation and an optimized learning rate schedule, FastText processes “input text at a speed of several orders of magnitude [faster than] that of ConvNets” (Zhang & LeCun, 2017, p. 6). Although with simple architecture, FastText has been considered effective and efficient in diverse text classification tasks (Qiao et al., 2018) and is usually “at par with deep learning classifiers in terms of accuracy, and much faster for training and evaluation” (Jha & Mamidi, 2017, p. 12). It is especially effective when performing tasks without the need for extensive hyperparameter tuning (Rosenthal et al., 2021). More algorithmic details of FastText are described by Joulin et al. (2017).
TextCNN
CNNs adopt a mathematical operation called convolution over feature matrix, a specialized kind of linear operation (Goodfellow et al., 2016). CNNs were originally proposed for computer vision and have gradually been considered effective for addressing issues concerning NLP, such as semantic parsing, sentence modeling, and search query retrieval. TextCNN is adopted particularly for the extraction of sentence features.
In a TextCNN, a one-dimensional convolution operation is used to produce new features, \(c\), using a filter \(w\in {\mathbb{R}}^{h\times k}\) on a window of words \({x}_{i:i-h+1}\), where \(c_i=f(w\otimes x_{i:i-h+1}+b)\), in which \(b\in {\mathbb{R}}\) donates the bias term, and \(f(\cdot )\) indicates a non-linear function. Each filter produces a feature vector \({\varvec{c}}={[{c}_{1},{c}_{2},...,{c}_{m}]}^{\mathrm{T}}\) with padding. Then a max-over-time pooling operation is implemented over each feature map, capturing the maximum value \(\widehat{c}=max({\varvec{c}})\) as the corresponding feature to the filter. The CNN model can capture various features using multiple filters, and the features are passed to a fully connected layer to output logits, \(a\in {\mathbb{R}},\) for label prediction and computing loss.
TextRNN
Recurrent Neural Network (RNN) (Liu et al., 2016), deals with a variable-length sequence input through a recurrent hidden state with activation at each stage depending upon that of the former time. In RNN, Bi-LSTM is adopted following the same settings as previous studies (Zhou et al., 2015). In each LSTM cell, the transition functions can be defined as Eq. (8), where \(\sigma\) is the sigmoid function, \(tanh\) is the hyperbolic tangent function, and \(\odot\) represents element-wise multiplication. More concretely, \({\text{f}}_{t}\) controls to what extent the old information can be ignored, \({\text{i}}_{t}\) controls how much new information can be added, and \({\text{o}}_{t}\) controls the output of the current cell. With the hidden dimension set to \(d\), \({{\varvec{h}}}_{t}\in {\mathbb{R}}^{d}\) at time \(t\), the whole semantic feature \(C\in {\mathbb{R}}^{d\times m}\) can be obtained. Similarly, \(C\) is mapped to logits \(a\in {\mathbb{R}}\) for label prediction and computing loss after passing through fully connected layers.
TextCRNN
Convolutional Recurrent Neural Network (CRNN) is a hybrid model that combines CNNs and RNNs. In CRNNs, CNNs extract features, whereas RNNs are utilized as a temporal summarizer. The adoption of RNNs for feature aggregation allows “the networks to take the global structure into account while the remaining convolutional layers extract local features” (Choi et al., 2017, p. 2392). CRNN was initially developed by Tang et al. (2015) for text classification and is extensively used in other domains like image classification and music transcription.
Briefly, Peng et al. (2020a, b), CRNN architecture involves conv, recurrent, and transcription layers. The first layer extracts a feature sequence from each input data in automation mode. A recurrent network is used to predict each frame output's feature sequence through conv layers. The transcription layer transforms frame predictions into label sequences. Given that CRNN is composed of varied network structures, a loss function is used for joint training. Specifically, conv layers’ components are generated from conv and max-pooling layers employed in a standard CNN model. The components extract serialized feature representations from input data. Before being fed into the network, data is normalized. Then, vector sequences are extracted from feature maps and the sequences’ feature vector is produced column-by-column on feature maps. A deep Bi-RNN network is created above the conv layers to predict label distribution \({y}_{t}\) for each frame \({x}_{t}\) in feature sequence \(x={x}_{1},...,{x}_{T}\).
TextRCNN
TextRCNN proposed by Lai et al. (2015) is a deep neural model for capturing text semantics. The input data is document \(D\) composed by a sequence of words \({w}_{1},{w}_{2},...,{w}_{n}\). The output data involves category components. Fundamentally, Recurrent Convolutional Neural Network (RCNN) is used for addressing RNN’s bias of the dominance of subsequent words (Mahmood et al., 2020). RCNN takes the recurrent structure’s advantages to capture contextual information and learn documents’ feature representations with the help of CNNs, alongside the use of a max-pooling layer to facilitate important words’ selection (Yan et al., 2018). The recurrent structure, CNNs, and max-pooling layer are dependent on the neighboring units whose activation in RCNN changes as time passes (Salehinejad et al., 9) and (10) represent the calculation of word wi’s left and right contexts represented by \({c}_{l}\left({w}_{i}\right),\) and \({c}_{r}({w}_{i})\cdot e({w}_{i-1})\) is word embedding.\({W}^{(l)}\) matrix converts the hidden layer into subsequent hidden layers. \({W}^{(sl)}\) matrix connects the current word’s semantics with the left context of the subsequent word. \(f\) represents a non-linear activation function. Equation (11) illustrates word representation by concatenating \({C}_{l}({W}_{i})\), left context vector, \(e({W}_{i})\), word embedding and \({C}_{r}({W}_{i})\), and right context vector. Each word representation \({x}_{i}\) is passed through a standard layer in which a linear transformation alongside the \(\mathrm{tanh}\) function is used and leads to \(y\) which involves a semantic vector used for finding highly valuable semantics in the text. Subsequently, a max-pooling layer is employed by using Eq. (12). The max-pooling layer is adopted to extract features of each word representation. In Eq. (12), the max function takes the maximum from components of a word representation \({x}_{i}\). Finally, the output layer can be calculated by Eq. (13), where \({y}_{i}^{(2)}\) is passed through a softmax function as Eq. (14) that transforms the output into possibility where text is classified into the most likely category. The network's training target manages to maximize a specific category's log likelihood. The network's weights are initialized by using a uniform distribution.
TextHAN
The Hierarchical Attention Network (HAN) model was developed by Yang et al. (2016) for text classification where a sentence’s representation was constructed through the processing of a sequence of its constituent words with the use of a bidirectional GRU. The sentences’ representations then go through sentence-level processing via another bidirectional GRU to construct document representation. A HAN model contains Word Encoder, Word Attention, Sentence Encoder, and Sentence Attention. First, sentence-level discourse segmentation is adopted for the division of a sentence into clauses. Then, a Bi-LSTM is used for modeling all clauses in the sentence and a word-level attention mechanism is adopted for capturing essential words in each clause. Finally, another Bi-LSTM is used for modeling the attentive representation of each clause, and a clause-level attention mechanism is implemented for capturing essential clauses in a sentence.
Algorithmically, HAN exploits the structure of the documents by encoding the text in two consecutive steps. First, a Bi-GRU (\(EN{C}_{w}\)) followed by a self-attention mechanism (\(EN{C}_{s}\)) turns the word embeddings (\({w}_{it}\)) of each section \({s}_{i}\) with \({T}_{i}\) words into a section embedding \({c}_{i}\). See Eqs. (15–17) for details, in which \({u}^{(s)}\) is a trainable vector. Then, \(EN{C}_{d}\), another BIGRU with self-attention, converts the section embeddings (\(S\) in total, as many as the sections) to the final document representation \(d\). See Eqs. (18–20) for details, in which \({u}^{(d)}\) is a trainable vector. The final decoder \(DE{C}_{d}\) of HAN is the same as in Bi-GRU-ATT.
TextSANN
Lin et al. (2017a, b) developed a sentence embedding model composed of a Bi-LSTM and a self-attention mechanism. Here, the self-attention mechanism is used to “provide a set of summation weight vectors for the LSTM hidden states”, and “the summation weight vectors are dotted with the LSTM hidden states, [and] the resulting weighted LSTM hidden states are considered as an embedding for the sentence” (Lin et al., 2017a, b, p. 2). Algorithmically, TextSANN adopts an attention mechanism over Bi-LSTM’s hidden states to generate a representation \(u\) of an input sentence. In the attention mechanism, \(\{{h}_{1},...,{h}_{T}\}\) are Bi-LSTM’s output hidden vectors, which are fed to an affine transformation \((W,{b}_{w})\) to output a set of keys \(({\overline{h} }_{1},...,{\overline{h} }_{T})\). Here, “the \(\left\{{\alpha }_{i}\right\}\) represent[s] the score of similarity between the keys and a learned context query vector \({u}_{w}\), and these weights are used to produce the final representation \(u\), which [is] a weighted linear combination of the hidden vectors” (Conneau et al., 2017, p. 673).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, X., Zou, D., Cheng, G. et al. Deep neural networks for the automatic understanding of the semantic content of online course reviews. Educ Inf Technol 29, 3953–3991 (2024). https://doi.org/10.1007/s10639-023-11980-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-023-11980-6