Abstract
Improving student graduation is of utmost importance for higher education institutions, both public and private. The key contribution of this study is to apply the uplift modeling framework to optimize preparatory course assignments as an instrument to boost student success for graduation. Specifically, we concentrate on two university programs, English and Math preparatory courses, to identify students who would benefit the most from these courses and successfully graduate. To achieve this objective, we analyze 10-years of incoming freshmen data with a wide range of feature variables from a major university in the US. We then build and test several uplift methodologies to determine students’ response to the treatments. The best-performing model allows us to identify students in different segments and target those who are most responsive to the treatment to achieve optimal results. Additionally, we identify the most significant variables and provide student profiles and attributes that distinguish those who would gain from preparatory courses from those who would not. The framework developed in this study can serve as a valuable tool for decision-making and policy support. It can improve not only the student success perspective, but also the allocation of university resources by identifying and advising a fraction of students who would benefit from taking these preparatory courses and have a positive impact on their graduation.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10479-023-05607-9/MediaObjects/10479_2023_5607_Fig7_HTML.png)
Similar content being viewed by others
Notes
We note that there has been a change in the policy across the university requiring some students to take the course. However, the time frame for the data set herein does not contain the policy change
The gold standard would have been utilizing a fully randomized treatment assignment mechanism.
It is essential to acknowledge that the term "balance" in matching does not pertain to the conventional idea of balance in machine learning. Typically, a "balanced" dataset has the same number of observations across all categories of the outcome variable Y, or an equal number of observations across all treatment groups. In matching, balance alludes to a distinct concept that can be characterized as both the treatment and control groups having "the same joint distribution of observed covariates." (Diamond & Sekhon, 2013).
References
Abadie, A. (2005). Semiparametric difference-in-differences estimators. The Review of Economic Studies, 72(1), 1–19. https://doi.org/10.1111/0034-6527.00321
Bermeo, C., Michell, K., & Kristjanpoller, W. (2023). Estimation of causality in economic growth and expansionary policies using uplift modeling. Neural Computing and Applications. https://doi.org/10.1007/s00521-023-08397-0
DeBerard, M. S., Spielmans, G. I., & Julka, D. L. (2004). Predictors of academic achievement and retention among college freshmen: A longitudinal study. College Student Journal, 38(1), 66–81.
Delaney, A. M. (2008). Designing retention research for assessment and enhanced competitive advantage. Tertiary Education and Management, 14, 57–66.
Delen, D., Topuz, K., & Eryarsoy, E. (2020). Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition. European Journal of Operational Research, 281(3), 575–587. https://doi.org/10.1016/j.ejor.2019.03.037
Devriendt, F., Guns, T., & Verbeke, W. (2020). Learning to rank for uplift modeling. http://arxiv.org/abs/2002.05897
Dharmawan, T., Ginardi, H., & Munif, A. (2018). Dropout Detection Using Non-Academic Data. In 2018 4th international conference on science and technology (ICST) (pp. 1–4). https://doi.org/10.1109/ICSTC.2018.8528619
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. https://doi.org/10.1162/REST_a_00318
Elbadrawy, A., Polyzou, A., Ren, Z., Sweeney, M., Karypis, G., & Rangwala, H. (2016). Predicting student performance using personalized analytics. Computer, 49(4), 61–69. https://doi.org/10.1109/MC.2016.119
Fischer, E. M. J. (2007). Settling into campus life: Differences by race/ethnicity in college involvement and outcomes. The Journal of Higher Education, 78(2), 125–161.
Gershenfeld, S., Ward Hood, D., & Zhan, M. (2016). The role of first-semester GPA in predicting graduation rates of underrepresented students. Journal of College Student Retention: Research, Theory & Practice, 17(4), 469–488.
Gross, S. M., & Tibshirani, R. (2016). Data Shared Lasso: A novel tool to discover uplift. Computational Statistics & Data Analysis, 101, 226–235.
Gubela, R. M., & Lessmann, S. (2021). Uplift modeling with value-driven evaluation metrics. Decision Support Systems, 150, 113648. https://doi.org/10.1016/j.dss.2021.113648
Gubela, R. M., Lessmann, S., & Jaroszewicz, S. (2020). Response transformation and profit decomposition for revenue uplift modeling. European Journal of Operational Research, 283(2), 647–661. https://doi.org/10.1016/j.ejor.2019.11.030
Gubela, R. M., Lessmann, S., Haupt, J., Baumann, A., Radmer, T., & Gebert, F. (2017). Revenue uplift modeling. Machine Learning for Marketing Decision Support.
Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2012). Random forests for uplift modeling: An insurance customer retention case. Lecture notes in business information processing, 115 LNBIP (pp. 123–133). https://doi.org/10.1007/978-3-642-30433-0_13/COVER
Guelman, L., Guillén, M., & Pérez-Marín, A. M. (2015). A decision support framework to implement optimal personalized marketing interventions. Decision Support Systems, 72, 24–32. https://doi.org/10.1016/j.dss.2015.01.010
Jaskowski, M., & Jaroszewicz, S. (2012). Uplift modeling for clinical trial data. ICML Workshop on Clinical Data A, 46, 79–95.
Kane, K., Lo, V. S., & Zheng, J. (2014). Mining for the truly responsive customers and prospects using true-lift modeling: Comparison of new and existing methods. Journal of Marketing Analytics, 2(4), 218–238. https://doi.org/10.1057/jma.2014.18
Khan, Z., Gul, A., Perperoglou, A., Miftahuddin, M., Mahmoud, O., Adler, W., & Lausen, B. (2020). Ensemble of optimal trees, random forest and random projection ensemble classification. Advances in Data Analysis and Classification, 14(1), 97–116. https://doi.org/10.1007/s11634-019-00364-9
King, J. E. (1999). Hel** students balance work, borrowing, and college. About Campus, 4(4), 17–22.
Kostopoulos, G., Kotsiantis, S., & Pintelas, P. (2015). Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic conference on informatics (pp. 38–43).
Lai, Y.-T., Wang, K., Ling, D., Shi, H., & Zhang, J. (2006). Direct marketing when there are voluntary buyers. In Sixth international conference on data mining (ICDM’06) (pp. 922–927). https://doi.org/10.1109/ICDM.2006.54
Larose, S., Cyrenne, D., Garceau, O., Harvey, M., Guay, F., Godin, F., Tarabulsy, G. M., & Deschênes, C. (2011). Academic mentoring and dropout prevention for students in math, science and technology. Mentoring & Tutoring: Partnership in Learning, 19(4), 419–439.
Lo, V. S. Y. (2002). The true lift model. ACM SIGKDD Explorations Newsletter, 4(2), 78–86. https://doi.org/10.1145/772862.772872
Maldonado, S., Miranda, J., Olaya, D., Vásquez, J., & Verbeke, W. (2021). Redefining profit metrics for boosting student retention in higher education. Decision Support Systems, 143, 113493. https://doi.org/10.1016/j.dss.2021.113493
McGrath, M., & Braunstein, A. (1997). The prediction of freshmen attrition: An examination of the importance of certain demographic, academic, financial and social factors. College Student Journal.
Morgan, S. L., & Winship, C. (2014). Counterfactuals and causal inference. Cambridge University Press. https://doi.org/10.1017/CBO9781107587991
Musso, M. F., Hernández, C. F. R., & Cascallar, E. C. (2020). Predicting key educational outcomes in academic trajectories: A machine-learning approach. Higher Education, 80, 875–894.
Olaya, D., Vásquez, J., Maldonado, S., Miranda, J., & Verbeke, W. (2020). Uplift modeling for preventing student dropout in higher education. Decision Support Systems, 134, 113320. https://doi.org/10.1016/J.DSS.2020.113320
Oztekin, A. (2016). A hybrid data analytic approach to predict college graduation status and its determinative factors. Industrial Management & Data Systems, 116(8), 1678–1699. https://doi.org/10.1108/IMDS-09-2015-0363
Palacios, C. A., Reyes-Suárez, J. A., Bearzotti, L. A., Leiva, V., & Marchant, C. (2021). Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy, 23(4), 485. https://doi.org/10.3390/e23040485
Radcliffe, N. J., & Surry, P. D. (2011). Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions (pp. 1–33).
Rice, D. (2009). Product review: Faculty success through mentoring: A guide for mentors, mentees, and leaders. Adult Learning, 20(1–2), 42–43. https://doi.org/10.1177/104515950902000111
Rubin, D. B. (2005). Bayesian inference for causal effects. In The annals of statistics (pp. 1–16). JSTOR. https://doi.org/10.1016/S0169-7161(05)25001-0
Shimizu, A., Togashi, R., Lam, A., & Van Huynh, N. (2019). Uplift modeling for cost effective coupon marketing in c-to-c e-commerce. In 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI) (pp. 1744–1748).
Stuart, E. A., & Green, K. M. (2008). Using full matching to estimate causal effects in nonexperimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Developmental Psychology. https://doi.org/10.1037/0012-1649.44.2.395
Tampakas, V., Livieris, I. E., Pintelas, E., Karacapilidis, N., & Pintelas, P. (2019). Prediction of students’ graduation time using a two-level classification algorithm. In Technology and innovation in learning, teaching and education: First international conference, tech-ed (pp. 553–565). Springer. https://doi.org/10.1007/978-3-030-20954-4_42
Thomas, L. (2002). Student retention in higher education: The role of institutional habitus. Journal of Education Policy, 17(4), 423–442. https://doi.org/10.1080/02680930210140257
Thomas, L. (2012). Building student engagement and belonging in Higher Education at a time of change. Paul Hamlyn Foundation, 100(1–99).
Yizar Jr, J. H. (2010). Enrollment factors that predict persistence of at-risk (low income and first generation) students' journey towards completion of a baccalaureate degree at Idaho State University. Idaho State University.
Yorke, M. (2016). The development and initial use of a survey of student ‘belongingness’, engagement and self-confidence in UK higher education. Assessment & Evaluation in Higher Education, 41(1), 154–166.
Zepke, N., & Leach, L. (2010). Improving student engagement: Ten proposals for action. Active Learning in Higher Education, 11(3), 167–177.
Funding
First author acknowledges the financial support by Craig School of Business—CO RSCA and PRSCA 22/23.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Both authors declares that he has no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Data set variables
Appendix A: Data set variables
Variable | Description | Type | % missing | Imputation |
---|---|---|---|---|
ABC | Admission basis code | Categorical | 0 | – |
ACAD_PLAN | Student initial academic plan | Categorical | 0 | – |
ACT | American College Test (ACT) score-English | Numeric | 73 | Median |
ACT_COMP | American College Test (ACT) score-composite | Numeric | 0 | – |
ACT_MATH | American College Test (ACT) score-math | Numeric | 0 | – |
ACT_READ | American College Test (ACT) score-reading | Numeric | 0 | – |
ACT_SCI | American College Test (ACT) score-science | Numeric | 0 | – |
CITIZ_M | Citizenship status | Categorical | 0 | – |
COLLEGE_CODE | College campus code | Categorical | 0 | – |
CRED_OBJ | Credential or Subject Matter waiver objective | Categorical | 85 | New category “unknown” |
CRED_STATUS | CCTC-approved education credential status | Categorical | 0 | – |
CSU_RACE_CAT | Student ethnicity | Categorical | 0 | – |
DEGR_OBJ | Student immediate degree objective code | Categorical | 0 | – |
DEP_FAM_SZ | Family size of student determine him/herself dependent for financial aid purposes | Numeric | 0 | – |
DEP_INCOME_CODE | Family income level of student determine him/herself dependent for financial aid purposes | Categorical | 0 | – |
DEPT_CODE | Highest degree held by the student | Categorical | 0 | – |
EAPE_STATUS | Student English remediation status | Categorical | 17 | New category “unknown” |
EAPM_STATUS | Student Mathematics remediation status | Categorical | 19 | New category “unknown” |
EDU_FATHER | Student father’s highest-level education-attained | Categorical | 0 | – |
EDU_MOTHER | Student mother’s highest-level education-attained | Categorical | 0 | – |
ELM_REC | Student Entry Level Mathematics (ELM) score | Numeric | 0 | – |
ELM_STATUS | Student Entry Level Mathematics (ELM) receive status | Categorical | 0 | – |
ENR_STATUS | Student current enrollment status | Categorical | 0 | – |
EPT_COMP | Student English Placement Test (EPT) score-composition | Numeric | 45 | Median |
EPT_ESSAY | Student English Placement Test (EPT) score-essay | Numeric | 45 | Median |
EPT_READ | Student English Placement Test (EPT) score-reading | Numeric | 45 | Median |
EPT_STATUS | Student English Placement Test (EPT) status | Categorical | 0 | – |
EPT_TOT | Student English Placement Test (EPT) score- total | Numeric | 45 | Median |
GE_COMP_STATUS | Student GE-Breadth English composition status | Categorical | 0 | – |
GE_CRIT_STATUS | Student Ge-Breadth Critical Thinking course status | Categorical | 0 | – |
GE_MATH_STATUS | Student GE-Breadth Mathematics/Quantitative easoning course status | Categorical | 0 | – |
GE_ORAL_STATUS | Student GE-Breadth Oral Communications course status | Categorical | 0 | – |
HISP_ETH_CAT | Hispanic/Latino Ethnic category | Categorical | 46 | New category “unknown” |
HISP_STATUS | Hispanic/Latino Ethnic status | Categorical | 0 | – |
HS_GPA | Student High School GPA | Numeric | 0 | – |
HS_TRANS_STATUS | Student High School Transcript receive status | Categorical | 0 | – |
INDEP_INCOME_COD | Gross income level of student reported as independent applicant | Categorical | 0 | – |
INSTI_M | Student latest institution of origin type | Categorical | 0 | – |
MAJOR_CODE | Student major | Categorical | 0 | – |
MULT_RACE_CD | Student race | Categorical | 0 | – |
OPTSFIX_CD | Student major code | Categorical | 0 | – |
PREP_ENG | Number of semesters of college preparatory English | Numeric | 0 | – |
PREP_MATH | Number of semesters of college preparatory mathematics | Numeric | 0 | – |
PREP_SOC_SCI | Number of semesters of college preparatory social sciences | Numeric | 27 | Median |
RES_CODE | Residential type | Categorical | 0 | – |
RES_STATUS | Residential status | Categorical | 0 | – |
SAT_COMP | Scholastic Assessment Test (SAT) score-composition | Numeric | 11 | Median |
SAT_MATH | Scholastic Assessment Test (SAT) score-math | Numeric | 11 | Median |
SAT_SCORE | Scholastic Assessment Test (SAT) score-total | Numeric | 73 | Median |
SAT_VERB | Scholastic Assessment Test (SAT) score-reading | Numeric | 11 | Median |
SEX_M | Student gender | Categorical | 0 | – |
STD_LEV | Student current academic level | Numeric | 0 | – |
TR_GPA | Transfer GPA | Numeric | 70 | Median |
TR_UN | Transfer units earned | Numeric | 0 | – |
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tanai, Y., Ciftci, K. How to customize an early start preparatory course policy to improve student graduation success: an application of uplift modeling. Ann Oper Res (2023). https://doi.org/10.1007/s10479-023-05607-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10479-023-05607-9