Abstract
Predictive modelling in the education domain can be utilised to significantly improve teaching and learning experiences. Massive Open Online Courses (MOOCs) generate a large volume of data that can be exploited to predict and evaluate student performance based on various factors. This paper has two broad aims. Firstly, to develop and tune several Machine Learning (ML) models to perform classification tasks on the dataset to predict student performance, including Linear Regression, Logistic Regression, Random Forests, K-Nearest Neighbours, and more. Secondly, to evaluate the efficacy of these ML models and identify those which are best suited to this task. The categories of data utilised in achieving these aims include (i) demographic information, (ii) academic background, and (iii) interaction with MOOC course materials. The research procedure comprises five phases: data exploration to analyse the dataset, feature engineering which involves discerning the most important features and converting them into a format decipherable by the ML models, model building, model evaluation by measurement of accuracy, and subsequent comparative evaluation between the different models. The results achieved in this study are expected to have implications on how MOOC platforms utilise data to improve user experience. As indicated by the findings of this study, the data collected by these platforms may be used to predict performance with accuracy of over 77%; this extracted information can be exploited to enhance educational theory or practices in the context of MOOCs, for instance by implementing varying teaching methodologies or providing different types of resources based on predicted performance.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10639-023-12398-w/MediaObjects/10639_2023_12398_Fig7_HTML.png)
Similar content being viewed by others
Data availability
The datasets generated and/or analysed for this study are available in the Open University Learning Analytics repository, https://analyse.kmi.open.ac.uk/open_dataset (“Open University Learning Analytics Dataset “, n.d.).
References
About the open university. (n.d.). About the open university. Retrieved 21 September 2023, from https://www.open.ac.uk/about/main/
Al Madhoun, W. (2020). Predictive modelling of student academic performance–the case of higher education in Middle East (Doctoral dissertation, University of East London). https://doi.org/10.15123/uel.88q0w
Bangash, M., Chaudhry, W., Rosales, L., Bilal, M., & Cui, L. (2022). A machine learning-based course enrollment recommender system.
Brownlee, J. (2020b, August 15). Linear discriminant analysis for machine learning. https://machinelearningmastery.com/linear-discriminant-analysis-for-machine-learning/
Brownlee, J. (2020a, June 30). Why one-hot encode data in machine learning? https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Ekowo, M., & Palmer, I. (2016). The promise and peril of predictive analytics in higher education: A landscape analysis. New America.
Frith, C. (1997). Motivation to learn. Educational Communications and technology, 2–11.
Harrison, O. (2019, July 14). Machine learning basics with the K-nearest neighbors algorithm. Medium. https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
How Linear regression algorithm works—ArcGIS Pro | Documentation. (n.d.). Retrieved 21 September 2023, from https://pro.arcgis.com/en/pro-app/latest/tool-reference/geoai/how-linear-regression-works.htm
Ippolito, P. P. (2019, October 11). Feature extraction techniques. Medium. Retrieved September 21, 2023, from https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be
Jia, P., & Maloney, T. (2014). Using predictive modelling to identify students at risk of poor university outcomes. Higher Education, 70(1), 127–149. https://doi.org/10.1007/s10734-014-9829-7
Khor, E. T. (2022). A data mining approach using machine learning algorithms for early detection of low-performing students. International Journal of Information and Learning Technology, 39(2), 122–132. https://doi.org/10.1108/IJILT-09-2021-0144
Kizilcec, R. F., Piech, C., & Schneider, E. (2013, April). Deconstructing disengagement: analyzing learner subpopulations in massive open online courses. In Proceedings of the third international conference on learning analytics and knowledge (pp. 170–179). https://doi.org/10.1145/2460296.246f0330
Kurzweil, M., & Wu, D. D. (2015). Building a pathway to student success at Georgia State University.
Littlejohn, A., Hood, N., Milligan, C., & Mustain, P. (2016). Learning in MOOCs: Motivations and self-regulated learning in MOOCs. The Internet and Higher Education, 29, 40–48.
Madjarov, I., & Betari, A. (2008, December). Adaptive learning sequencing for course customization: A web service approach. In 2008 IEEE Asia-Pacific Services Computing Conference (pp. 530–535). https://doi.org/10.1109/APSCC.2008.297
Makombe, F., & Lall, M. (2020). A predictive model for the determination of academic performance in private higher education institutions. International Journal of Advanced Computer Science and Applications, 11(9). https://doi.org/10.14569/IJACSA.2020.0110949
Miguéis, V. L., Freitas, A., Garcia, P. J., & Silva, A. (2018). Early segmentation of students according to their academic performance: A predictive modelling approach. Decision Support Systems, 115, 36–51. https://doi.org/10.1016/j.dss.2018.09.001
Mondal, P. (2013, August 22). 7 Important factors that may affect the learning process. Your Article Library. https://www.yourarticlelibrary.com/learning/7-important-factors-that-may-affect-the-learning-process/6064
Open University Learning Analytics Dataset. Open Learning Analytics | OU Analyse | Knowledge Media Institute | The Open University. (n.d.). Retrieved June 10, 2023, from https://analyse.kmi.open.ac.uk/open_dataset
Raj, A. (2020, October 5). Unlocking the true power of support vector regression. Medium. https://towardsdatascience.com/unlocking-the-true-power-of-support-vector-regression-847fd123a4a0
Raj, A. (2021, January 5). The perfect recipe for classification using logistic regression. Medium. https://towardsdatascience.com/the-perfect-recipe-for-classification-using-logistic-regression-f8648e267592
Romero, C., Ventura, S., & García, E. (2008). Data mining in course management systems: Moodle case study and tutorial. Computers & Education, 51(1), 368–384.
Salem, R. O., Al-Mously, N., Nabil, N. M., Al-Zalabani, A. H., Al-Dhawi, A. F., & Al-Hamdan, N. (2013). Academic and socio-demographic factors influencing students’ performance in a new Saudi medical school. Medical Teacher, 35(sup1), S83–S89.
Singh Chauhan, N. (2022, February 9). Decision tree algorithm, explained. KDnuggets. https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html#:~:text=The%20goal%20of%20using%20a,the%20root%20of%20the%20tree.
Singh, H. (2014, August 7). What’s wrong with MOOcs, and why aren’t they changing the game in education? Wired. https://www.wired.com/insights/2014/08/whats-wrong-moocs-arent-changing-game-education/
Talari, S. (2022, November 1). Random forest vs decision tree: Key differences. KDnuggets. https://www.kdnuggets.com/random-forest-vs-decision-tree-key-differences.html
Wang, Z., Zhu, C., Ying, Z., Zhang, Y., Wang, B., **, X., & Yang, H. (2018, November). Design and implementation of early warning system based on educational big data. In 2018 5th International Conference on Systems and Informatics (icsai) (pp. 549–553). https://doi.org/10.1109/ICSAI.2018.8599357
Xu, J., Moon, K. H., & van der Schaar, M. (2017). A machine learning approach for tracking and predicting student performance in degree programs. IEEE Journal of Selected Topics in Signal Processing, 11(5), 742–753. https://doi.org/10.1109/jstsp.2017.2692560
Yiu, T. (2021, September 29). Understanding Random Forest. Medium. https://towardsdatascience.com/understanding-random-forest-58381e0602d2
Funding
None.
Author information
Authors and Affiliations
Contributions
Conceptualisation, K.E.T.; methodology, K.E.T. and A.A.; formal analysis, A.A. and K.E.T. Both authors prepared, edited, and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ani, A., Khor, E.T. Development and evaluation of predictive models for predicting students performance in MOOCs. Educ Inf Technol (2023). https://doi.org/10.1007/s10639-023-12398-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10639-023-12398-w